System and method to reduce power down entry and exit latency
Granted: June 25, 2024
Patent Number:
12019499
A system and method for fast save/restore is disclosed. The system and method include one or more logical units (LUs) residing in independent power domains, one or more digital frequency synthesizers (DFS), each of the one or more DFS associated with one of the one or more LUs, the one or more DFSs configured to lock a system complex frequency and ramp the one or more LUs to system complex frequency, and one or more slave fast save/restore control (FSRC) units, each slave FSRC unit…
Dual phase clock distribution from a single source in a die-to-die interface
Granted: June 18, 2024
Patent Number:
12015412
A semiconductor package includes a first die having a phase locked loop outputting a local clock signal and a strobe signal to a first transmit block of the first die. The strobe signal has a phase offset relative to the local clock signal. A second die is aligned with the first die so each of a first plurality of connection points of the first die is substantially equidistant to a corresponding connection point of a second plurality of connection points of the second die. A plurality of…
Delta triplet index compression
Granted: June 18, 2024
Patent Number:
12014527
Methods, devices, and systems for compressing and decompressing a stream of indices associated with graphics primitives. A group of delta values is determined based on a group of indices of the stream of indices. The group of delta values is compared to delta values in a lookup table. The group of indices is compressed based on an entry in the lookup table if the group of delta values matches all delta values in the entry, otherwise, the group of indices is compressed based on…
Cross GPU scheduling of dependent processes
Granted: June 18, 2024
Patent Number:
12014442
A primary processing unit includes queues configured to store commands prior to execution in corresponding pipelines. The primary processing unit also includes a first table configured to store entries indicating dependencies between commands that are to be executed on different ones of a plurality of processing units that include the primary processing unit and one or more secondary processing units. The primary processing unit also includes a scheduler configured to release commands in…
Active hibernate and managed memory cooling in a non-uniform memory access system
Granted: June 18, 2024
Patent Number:
12014213
A method of operating a computing system includes storing a memory map identifying a first physical memory address as associated with a high performance memory and identifying a second physical memory address as associated with a low power consumption memory, servicing a first memory access request received from an application by accessing application data at the first physical memory address, in response to a change in one or more operating conditions of the computing system, moving the…
Techniques for reducing serialization in divergent control flow
Granted: June 18, 2024
Patent Number:
12014208
Techniques for executing shader programs with divergent control flow on a single instruction multiple data (“SIMD”) processor are disclosed. These techniques includes detecting entry into a divergent section of a shader program and, for the work-items that enter the divergent section, placing a task entry into a task queue associated with the target of each work-item. The target is the destination, in code, of any particular work-item, and is also referred to as a code segment…
Non-homogeneous chiplets
Granted: June 18, 2024
Patent Number:
12013810
A semiconductor module comprises multiple non-homogeneous semiconductor dies disposed on the semiconductor module, with each semiconductor die having a set of circuitry modules that are common to all of the semiconductor dies and also a set of supporting circuitry modules that are distinct between the semiconductor dies. An interconnect communicatively couples the semiconductor dies together. Commands for processing by the semiconductor module may be routed to individual semiconductor…
Host-level error detection and fault correction
Granted: June 18, 2024
Patent Number:
12013752
A processing system includes a processing device coupled to a memory configured to check for and correct faults in requested data. In response to correcting the faults of the requested data, the memory sends the corrected data and unused check bits to the processing device as a plurality of fetch returns. The memory also sends a parity fetch based on the corrected data and one or more operations to the processing device. After receiving the plurality of fetch returns and the unused check…
Weak precharge before write dual-rail SRAM write optimization
Granted: June 11, 2024
Patent Number:
12009025
A method for accessing a memory cell includes enabling precharging of a bit line of the memory cell before a next access of the memory cell. The method includes disabling the precharging after a first interval if the next access is a write. The method includes disabling the precharging after a second interval if the next access is a read. The first interval is shorter than the second interval.
Systems and methods for continuous wordline monitoring
Granted: June 11, 2024
Patent Number:
12009047
The disclosed computing device includes a cache memory and at least one processor coupled to the cache memory. The at least one processor is configured to copy data written to one or more nonredundant wordlines of the cache memory to one or more redundant wordlines of the cache memory. The at least one processor is additionally configured to detect a mismatch between data read from the one or more nonredundant wordlines and data stored in the one or more redundant wordlines. The at least…
Automatic central processing unit (CPU) usage optimization
Granted: June 11, 2024
Patent Number:
12008401
Automatic central processing unit (CPU) usage optimization includes: monitoring performance activity of a workload comprising a plurality of threads; and modifying a resource allocation of a plurality of cores for the plurality of threads based on the performance activity.
Mechanism for reducing coherence directory controller overhead for near-memory compute elements
Granted: June 11, 2024
Patent Number:
12008378
A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the…
Method and apparatus for efficient programmable instructions in computer systems
Granted: June 11, 2024
Patent Number:
12008371
Systems, apparatuses, and methods for implementing as part of a processor pipeline a reprogrammable execution unit capable of executing specialized instructions are disclosed. A processor includes one or more reprogrammable execution units which can be programmed to execute different types of customized instructions. When the processor loads a program for execution, the processor loads a bitfile associated with the program. The processor programs a reprogrammable execution unit with the…
Memory bit cell with homogeneous layout pattern of base layers for high density memory macros
Granted: June 11, 2024
Patent Number:
12008237
An apparatus and method for designing memory macro blocks. A memory includes one or more memory banks, each with one or more arrays and input/output (I/O) blocks used to perform read accesses and write accesses. An array that utilizes multiple memory bit cells, and the I/O blocks are placed in a manner that they are abutting one another. The layout of the memory bit cells and the I/O blocks use a same subset of parameters of a semiconductor fabrication process. As a result, the memory…
Signal bridging using an unpopulated processor interconnect
Granted: June 11, 2024
Patent Number:
12007928
Signal bridging using an unpopulated processor interconnect, including: communicatively coupling an apparatus to a plurality of first signal paths between a bootstrap processor (BSP) and a processor interconnect of a circuit board; communicatively coupling the apparatus to a plurality of second signal paths between the processor interconnect and a peripheral interface of the circuit board; and communicatively coupling the BSP to the peripheral interface via one or more third signal paths…
Pattern-based cache block compression
Granted: June 4, 2024
Patent Number:
12001237
Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is…
Read clock toggle at configurable PAM levels
Granted: June 4, 2024
Patent Number:
12002541
A read clock circuit selectively provides a read clock signal from a memory to a memory controller over a memory bus. A pulse-amplitude modulation (PAM) driver including an input and an output capable of driving at least three levels indicating respective digital values. A digital control circuit is coupled to the PAM driver and operable to cause the PAM driver to provide a preamble signal before the read clock signal, the preamble signal including an initial toggling state in which the…
Multi-node memory address space for PCIe devices
Granted: June 4, 2024
Patent Number:
12001370
A device in an interconnect network is provided. The device comprises an end point processor comprising end point memory and an interconnect network link in communication with an interconnect network switch. The device is configured to issue, by the end point processor, a request to send data from the end point memory to other end point memory of another end point processor of another device in the interconnect network and provide, to the interconnect network switch, the request using…
Uniform cache system for fast data access
Granted: June 4, 2024
Patent Number:
12001334
A uniform cache for fast data access including a plurality of compute units (CUs) and a plurality of L0 caches with an arrangement in a network configuration where each one of CUs is surrounded by a first group of the plurality of L0 caches and each of the plurality of L0 caches is surrounded by a L0 cache group and CU group. One of CUs, upon a request for data, queries the surrounding first group of L0 caches to satisfy the request. If the first group of L0 caches fails to satisfy the…
Device and method for reducing save-restore latency using address linearization
Granted: June 4, 2024
Patent Number:
12001265
Devices and methods for transitioning between power states of a device are provided. A program is executed using data stored in configuration registers assigned to a component of a device. For a first reduced power state, data of a first portion of the configuration registers is saved to the memory using a first set of linear address space. For a second reduced power state, data of a second portion of the configuration registers is saved to the memory using a second set of linear address…