Weak cache line invalidation requests for speculatively executing instructions
Granted: April 2, 2024
Patent Number:
11947456
Techniques for invalidating cache lines are provided. The techniques include issuing, to a first level of a memory hierarchy, a weak exclusive read request for a speculatively executing store instruction; determining whether to invalidate one or more cache lines associated with the store instruction in one or more memories; and issuing the weak invalidation request to additional levels of the memory hierarchy.
Suppressing cache line modification
Granted: April 2, 2024
Patent Number:
11947455
Disclosed is a system and method for use in a cache for suppressing modification of cache line. The system and method includes a processor and a memory operating cooperatively with a cache controller. The memory includes a coherence directory stored within a cache created to track at least one cache line in the cache via the cache controller. The processor instructs a cache controller to store a first data in a cache line in the cache. The cache controller tags the cache line based on…
Separate clocking for components of a graphics processing unit
Granted: April 2, 2024
Patent Number:
11947380
Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal…
Droop detection and control of digital frequency-locked loop
Granted: March 26, 2024
Patent Number:
11942953
A apparatus includes a reference signal generator, a droop detection circuit, a digital frequency-locked loop (DFLL), and a DFLL control circuit. The reference signal generator that receives a digital value and produces a pulse-density modulated signal based on the digital value. The droop detection circuit converts the pulse-density modulated signal to an analog signal, compares the analog signal to a monitored supply voltage, and responsive to detecting a droop of the monitored supply…
Dynamic dispatch for workgroup distribution
Granted: March 26, 2024
Patent Number:
11941723
Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines by a command processor of a graphics processing unit (GPU). Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units. Each workgroup of the plurality of workgroups is dynamically assigned…
Probe filter retention based low power state
Granted: March 26, 2024
Patent Number:
11940858
A data fabric routes requests between the plurality of requestors and the plurality of responders. The data fabric includes a crossbar router, a coherent slave controller coupled to the crossbar router, and a probe filter coupled to the coherent slave controller and tracking the state of cached lines of memory. Power state control circuitry operates, responsive to detecting any of a plurality of designated conditions, to cause the probe filter to enter a retention low power state in…
Distributed user mode processing
Granted: March 19, 2024
Patent Number:
11934873
A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without…
Assigning variable length address identifiers to packets in a processing system
Granted: March 19, 2024
Patent Number:
11936616
A controller assigns variable length addresses to addressable elements that are connected to a network. The variable length addresses are determined based on probabilities that packets are addressed to the corresponding addressable element. The controller transmits, to the addressable elements via the network, a routing table indicating the variable length addresses assigned to the addressable elements. Routers or addressable elements receive the routing table and route one or more…
Adaptive oscillator for clock generation
Granted: March 19, 2024
Patent Number:
11936382
An output clock frequency of an adaptive oscillator circuit changes in response to noise on an integrated circuit power supply line. The circuit features two identical delay lines which are separately connected to a regulated supply and a droopy supply. In response to noise on the droopy supply, the delay lines cause a change in the output clock frequency. The adaptive oscillator circuit slows down the output clock frequency when the droopy supply droops or falls below the regulated…
Data compression support for accelerated processor
Granted: March 19, 2024
Patent Number:
11935153
Data processing methods and devices are provided. A processing device comprises memory and a processor. The memory, which comprises a cache, is configured to store portions of data. The processor is configured to issue a store instruction to store one of the portions of data, provide identifying information associated with the one portion of data, compress the one portion of data; and store the compressed one portion of data across multiple lines of the cache using the identifying…
Partition and isolation of a processing-in-memory (PIM) device
Granted: March 19, 2024
Patent Number:
11934827
An apparatus that manages multi-process execution in a processing-in-memory (“PIM”) device includes a gatekeeper configured to: receive an identification of one or more registered PIM processes; receive, from a process, a memory request that includes a PIM command; if the requesting process is a registered PIM process and another registered PIM process is active on the PIM device, perform a context switch of PIM state between the registered PIM processes; and issue the PIM command of…
Routing and manufacturing with a minimum area metal structure
Granted: March 19, 2024
Patent Number:
11934764
Manufacturing a semiconductor chip based on redefining tolerance rules to create an otherwise prohibited structure including redefining a tolerance rule to permit creation of a minimum area metal trench structure violating the tolerance rule during a routing operation; and fabricating the minimum area metal trench structure on the semiconductor substrate based on the redefined tolerance rule.
Process isolation for a processor-in-memory (“PIM”) device
Granted: March 19, 2024
Patent Number:
11934698
Process isolation for a PIM device through exclusive locking includes receiving, from a process, a call requesting ownership of a PIM device. The request includes one or more PIM configuration parameters. The exclusive locking technique also includes granting the process ownership of the PIM device responsive to determining that ownership is available. The PIM device is configured according to the PIM configuration parameters.
Communication engine for hybrid interconnect technologies
Granted: March 19, 2024
Patent Number:
11934331
Systems, apparatuses, and methods for dynamically selecting between wired and wireless interconnects for sending packets are disclosed. A system includes at least a hybrid communication engine and a plurality of interconnects for connecting to various end-points. The communication engine dynamically discovers and utilizes the best interconnect technology available in between given end-points. The communication engine dynamically chooses the physical interconnect that is best suited at…
Data fabric clock switching
Granted: March 19, 2024
Patent Number:
11934251
A memory controller couples to a data fabric clock domain, and to a physical layer interface circuit PHY clock domain. A first interface circuit adapts transfers between the data fabric clock domain (FCLK) and the memory controllers clock domain, and a second interface circuit couples the memory controller to the PHY clock domain. A power controller responds to a power state change request by sending commands to the second interface circuit to change parameters of a memory system and to…
Rapid tag invalidation circuit
Granted: March 12, 2024
Patent Number:
11929114
A system and method for efficiently resetting data stored in a memory array are described. In various implementations, an integrated circuit includes a memory for storing data, and a processing unit that generates access requests for the data stored in the memory. When access circuitry of the memory array begins a reset operation, it reduces a power supply voltage level used by memory bit cells in a column of the array to a value less than a threshold voltage of transistors. Therefore,…
BVH node ordering for efficient ray tracing
Granted: March 12, 2024
Patent Number:
11928770
Methods and systems are disclosed for traversing nodes in a BVH tree by an intersection engine. Techniques disclosed comprise receiving, by the intersection engine, a traversal instruction, including a tracing-mode, ray data, and an identifier of a node to be traversed. Where the tracing-mode includes a closest hit mode and a first hit mode. If the node to be traversed is an internal node, the intersection engine determines, based on the tracing-mode, an order in which children nodes of…
Transfer of cachelines in a processing system based on transfer costs
Granted: March 12, 2024
Patent Number:
11928060
A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a requested cacheline, the second cache selects the candidate first cache having the shortest total…
Flexible, scalable graph-processing accelerator
Granted: March 5, 2024
Patent Number:
11921784
An accelerator device includes a first processing unit to access a structure of a graph dataset, and a second processing unit coupled with the first processing unit to perform computations based on data values in the graph dataset.
Quantum circuit mapping for multi-programmed quantum computers
Granted: March 5, 2024
Patent Number:
11922107
Systems and methods are disclosed that map quantum circuits to physical qubits of a quantum computer. Techniques are disclosed to generate a graph that characterizes the physical qubits of the quantum computer and to compute the resource requirements of each circuit of the quantum circuits. For each circuit, the graph is searched for a subgraph that matches the resource requirements of the circuit, based on a density matrix. Physical qubits, defined by the matching subgraph, are then…