AMD Patent Grants

Hybrid render with preferred primitive batch binning and sorting

Granted: April 9, 2024
Patent Number: 11954782
A method, system, and non-transitory computer readable storage medium for rasterizing primitives are disclosed. The method, system, and non-transitory computer readable storage medium includes: generating a primitive batch from a sequence of one or more primitives, wherein the primitive batch includes primitives sorted into one or more row groups based on which row of a plurality of rows each primitive intersects; and processing each row group, the processing for each row group…

Method and apparatus for implementing a rasterizer in GPU operations

Granted: April 9, 2024
Patent Number: 11954757
An apparatus, such as a graphical processing unit (GPU), includes one or more processors configured to determine a plurality of first locality information of a received wave at a processing unit and to select a first processing element of a plurality of processing elements, the first processing unit having a plurality of second locality information from a previous wave that matches the plurality of first locality information to execute the received wave.

Paging hierarchies for extended page tables and extended page attributes

Granted: April 9, 2024
Patent Number: 11954026
A processing system includes a processor core for processing instructions and a memory that stores a page table set including an extended page table having an extended page table entry storing extended page table attributes associated with a physical memory page. The system receives a virtual address and translates the virtual address to a physical address for the physical memory page. One or more extended page attributes associated with the physical memory page are retrieved from the…

Cross-chiplet performance data streaming

Granted: April 2, 2024
Patent Number: 11947476
Methods and systems are disclosed for cross-chiplet performance data streaming. Techniques disclosed include accumulating, by a subservient chiplet, event data associated with an event indicative of a performance aspect of the subservient chiplet; sending, by the subservient chiplet, the event data over a chiplet bus to a master chiplet; and adding, by the master chiplet, the received event data to an event record, the event record containing previously received, from the subservient…

Throttling hull shaders based on tessellation factors in a graphics pipeline

Granted: April 2, 2024
Patent Number: 11948251
A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive…

Redundancy method and apparatus for shader column repair

Granted: April 2, 2024
Patent Number: 11948223
Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit…

Machine learning inference engine scalability

Granted: April 2, 2024
Patent Number: 11948073
Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme…

Gang scheduling for low-latency task synchronization

Granted: April 2, 2024
Patent Number: 11948000
Systems, apparatuses, and methods for performing command buffer gang submission are disclosed. A system includes at least first and second processors and a memory. The first processor (e.g., CPU) generates a command buffer and stores the command buffer in the memory. A mechanism is implemented where a granularity of work provided to the second processor (e.g., GPU) is increased which, in turn, increases the opportunities for parallel work. In gang submission mode, the user-mode driver…

Method and apparatus for training memory

Granted: April 2, 2024
Patent Number: 11947833
A method and apparatus for training data in a computer system includes reading data stored in a first memory address in a memory and writing it to a buffer. Training data is generated for transmission to the first memory address. The data is transmitted to the first memory address. Information relating to the training data is read from the first memory address and the stored data is read from the buffer and written to the memory area where the training data was transmitted.

Enabling accelerated processing units to perform dataflow execution

Granted: April 2, 2024
Patent Number: 11947487
Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.

Duplicated registers in chiplet processing units

Granted: April 2, 2024
Patent Number: 11947473
Systems, apparatuses, and methods for implementing duplicated registers for access by initiators across multiple semiconductor dies are disclosed. A system includes multiple initiators on multiple semiconductor dies of a chiplet processor. One of the semiconductor dies is the master die, and this master die has copies of registers which can be accessed by the multiple initiators on the multiple semiconductor dies. When a given initiator on a given secondary die generates a register…

Weak cache line invalidation requests for speculatively executing instructions

Granted: April 2, 2024
Patent Number: 11947456
Techniques for invalidating cache lines are provided. The techniques include issuing, to a first level of a memory hierarchy, a weak exclusive read request for a speculatively executing store instruction; determining whether to invalidate one or more cache lines associated with the store instruction in one or more memories; and issuing the weak invalidation request to additional levels of the memory hierarchy.

Suppressing cache line modification

Granted: April 2, 2024
Patent Number: 11947455
Disclosed is a system and method for use in a cache for suppressing modification of cache line. The system and method includes a processor and a memory operating cooperatively with a cache controller. The memory includes a coherence directory stored within a cache created to track at least one cache line in the cache via the cache controller. The processor instructs a cache controller to store a first data in a cache line in the cache. The cache controller tags the cache line based on…

Separate clocking for components of a graphics processing unit

Granted: April 2, 2024
Patent Number: 11947380
Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal…

Droop detection and control of digital frequency-locked loop

Granted: March 26, 2024
Patent Number: 11942953
A apparatus includes a reference signal generator, a droop detection circuit, a digital frequency-locked loop (DFLL), and a DFLL control circuit. The reference signal generator that receives a digital value and produces a pulse-density modulated signal based on the digital value. The droop detection circuit converts the pulse-density modulated signal to an analog signal, compares the analog signal to a monitored supply voltage, and responsive to detecting a droop of the monitored supply…

Dynamic dispatch for workgroup distribution

Granted: March 26, 2024
Patent Number: 11941723
Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines by a command processor of a graphics processing unit (GPU). Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units. Each workgroup of the plurality of workgroups is dynamically assigned…

Probe filter retention based low power state

Granted: March 26, 2024
Patent Number: 11940858
A data fabric routes requests between the plurality of requestors and the plurality of responders. The data fabric includes a crossbar router, a coherent slave controller coupled to the crossbar router, and a probe filter coupled to the coherent slave controller and tracking the state of cached lines of memory. Power state control circuitry operates, responsive to detecting any of a plurality of designated conditions, to cause the probe filter to enter a retention low power state in…

Data fabric clock switching

Granted: March 19, 2024
Patent Number: 11934251
A memory controller couples to a data fabric clock domain, and to a physical layer interface circuit PHY clock domain. A first interface circuit adapts transfers between the data fabric clock domain (FCLK) and the memory controllers clock domain, and a second interface circuit couples the memory controller to the PHY clock domain. A power controller responds to a power state change request by sending commands to the second interface circuit to change parameters of a memory system and to…

Data compression support for accelerated processor

Granted: March 19, 2024
Patent Number: 11935153
Data processing methods and devices are provided. A processing device comprises memory and a processor. The memory, which comprises a cache, is configured to store portions of data. The processor is configured to issue a store instruction to store one of the portions of data, provide identifying information associated with the one portion of data, compress the one portion of data; and store the compressed one portion of data across multiple lines of the cache using the identifying…

Communication engine for hybrid interconnect technologies

Granted: March 19, 2024
Patent Number: 11934331
Systems, apparatuses, and methods for dynamically selecting between wired and wireless interconnects for sending packets are disclosed. A system includes at least a hybrid communication engine and a plurality of interconnects for connecting to various end-points. The communication engine dynamically discovers and utilizes the best interconnect technology available in between given end-points. The communication engine dynamically chooses the physical interconnect that is best suited at…