Throttling hull shaders based on tessellation factors in a graphics pipeline
Granted: April 2, 2024
Patent Number:
11948251
A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive…
Redundancy method and apparatus for shader column repair
Granted: April 2, 2024
Patent Number:
11948223
Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit…
Machine learning inference engine scalability
Granted: April 2, 2024
Patent Number:
11948073
Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme…
Gang scheduling for low-latency task synchronization
Granted: April 2, 2024
Patent Number:
11948000
Systems, apparatuses, and methods for performing command buffer gang submission are disclosed. A system includes at least first and second processors and a memory. The first processor (e.g., CPU) generates a command buffer and stores the command buffer in the memory. A mechanism is implemented where a granularity of work provided to the second processor (e.g., GPU) is increased which, in turn, increases the opportunities for parallel work. In gang submission mode, the user-mode driver…
Method and apparatus for training memory
Granted: April 2, 2024
Patent Number:
11947833
A method and apparatus for training data in a computer system includes reading data stored in a first memory address in a memory and writing it to a buffer. Training data is generated for transmission to the first memory address. The data is transmitted to the first memory address. Information relating to the training data is read from the first memory address and the stored data is read from the buffer and written to the memory area where the training data was transmitted.
Droop detection and control of digital frequency-locked loop
Granted: March 26, 2024
Patent Number:
11942953
A apparatus includes a reference signal generator, a droop detection circuit, a digital frequency-locked loop (DFLL), and a DFLL control circuit. The reference signal generator that receives a digital value and produces a pulse-density modulated signal based on the digital value. The droop detection circuit converts the pulse-density modulated signal to an analog signal, compares the analog signal to a monitored supply voltage, and responsive to detecting a droop of the monitored supply…
Dynamic dispatch for workgroup distribution
Granted: March 26, 2024
Patent Number:
11941723
Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines by a command processor of a graphics processing unit (GPU). Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units. Each workgroup of the plurality of workgroups is dynamically assigned…
Probe filter retention based low power state
Granted: March 26, 2024
Patent Number:
11940858
A data fabric routes requests between the plurality of requestors and the plurality of responders. The data fabric includes a crossbar router, a coherent slave controller coupled to the crossbar router, and a probe filter coupled to the coherent slave controller and tracking the state of cached lines of memory. Power state control circuitry operates, responsive to detecting any of a plurality of designated conditions, to cause the probe filter to enter a retention low power state in…
Adaptive oscillator for clock generation
Granted: March 19, 2024
Patent Number:
11936382
An output clock frequency of an adaptive oscillator circuit changes in response to noise on an integrated circuit power supply line. The circuit features two identical delay lines which are separately connected to a regulated supply and a droopy supply. In response to noise on the droopy supply, the delay lines cause a change in the output clock frequency. The adaptive oscillator circuit slows down the output clock frequency when the droopy supply droops or falls below the regulated…
Assigning variable length address identifiers to packets in a processing system
Granted: March 19, 2024
Patent Number:
11936616
A controller assigns variable length addresses to addressable elements that are connected to a network. The variable length addresses are determined based on probabilities that packets are addressed to the corresponding addressable element. The controller transmits, to the addressable elements via the network, a routing table indicating the variable length addresses assigned to the addressable elements. Routers or addressable elements receive the routing table and route one or more…
Data compression support for accelerated processor
Granted: March 19, 2024
Patent Number:
11935153
Data processing methods and devices are provided. A processing device comprises memory and a processor. The memory, which comprises a cache, is configured to store portions of data. The processor is configured to issue a store instruction to store one of the portions of data, provide identifying information associated with the one portion of data, compress the one portion of data; and store the compressed one portion of data across multiple lines of the cache using the identifying…
Distributed user mode processing
Granted: March 19, 2024
Patent Number:
11934873
A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without…
Partition and isolation of a processing-in-memory (PIM) device
Granted: March 19, 2024
Patent Number:
11934827
An apparatus that manages multi-process execution in a processing-in-memory (“PIM”) device includes a gatekeeper configured to: receive an identification of one or more registered PIM processes; receive, from a process, a memory request that includes a PIM command; if the requesting process is a registered PIM process and another registered PIM process is active on the PIM device, perform a context switch of PIM state between the registered PIM processes; and issue the PIM command of…
Routing and manufacturing with a minimum area metal structure
Granted: March 19, 2024
Patent Number:
11934764
Manufacturing a semiconductor chip based on redefining tolerance rules to create an otherwise prohibited structure including redefining a tolerance rule to permit creation of a minimum area metal trench structure violating the tolerance rule during a routing operation; and fabricating the minimum area metal trench structure on the semiconductor substrate based on the redefined tolerance rule.
Process isolation for a processor-in-memory (“PIM”) device
Granted: March 19, 2024
Patent Number:
11934698
Process isolation for a PIM device through exclusive locking includes receiving, from a process, a call requesting ownership of a PIM device. The request includes one or more PIM configuration parameters. The exclusive locking technique also includes granting the process ownership of the PIM device responsive to determining that ownership is available. The PIM device is configured according to the PIM configuration parameters.
Communication engine for hybrid interconnect technologies
Granted: March 19, 2024
Patent Number:
11934331
Systems, apparatuses, and methods for dynamically selecting between wired and wireless interconnects for sending packets are disclosed. A system includes at least a hybrid communication engine and a plurality of interconnects for connecting to various end-points. The communication engine dynamically discovers and utilizes the best interconnect technology available in between given end-points. The communication engine dynamically chooses the physical interconnect that is best suited at…
Data fabric clock switching
Granted: March 19, 2024
Patent Number:
11934251
A memory controller couples to a data fabric clock domain, and to a physical layer interface circuit PHY clock domain. A first interface circuit adapts transfers between the data fabric clock domain (FCLK) and the memory controllers clock domain, and a second interface circuit couples the memory controller to the PHY clock domain. A power controller responds to a power state change request by sending commands to the second interface circuit to change parameters of a memory system and to…
Transfer of cachelines in a processing system based on transfer costs
Granted: March 12, 2024
Patent Number:
11928060
A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a requested cacheline, the second cache selects the candidate first cache having the shortest total…
Rapid tag invalidation circuit
Granted: March 12, 2024
Patent Number:
11929114
A system and method for efficiently resetting data stored in a memory array are described. In various implementations, an integrated circuit includes a memory for storing data, and a processing unit that generates access requests for the data stored in the memory. When access circuitry of the memory array begins a reset operation, it reduces a power supply voltage level used by memory bit cells in a column of the array to a value less than a threshold voltage of transistors. Therefore,…
BVH node ordering for efficient ray tracing
Granted: March 12, 2024
Patent Number:
11928770
Methods and systems are disclosed for traversing nodes in a BVH tree by an intersection engine. Techniques disclosed comprise receiving, by the intersection engine, a traversal instruction, including a tracing-mode, ray data, and an identifier of a node to be traversed. Where the tracing-mode includes a closest hit mode and a first hit mode. If the node to be traversed is an internal node, the intersection engine determines, based on the tracing-mode, an order in which children nodes of…