AMD Patent Grants

Enhanced method for a useful blockchain consensus

Granted: April 9, 2024
Patent Number: 11956368
An approach is provided for implementing a useful proof-of-work consensus algorithm. A proposed block is received. A combined hash value is generated based on the proposed block and a nonce value. The combined hash value is divided into a plurality of hash value pieces that each correspond to a work packet of a plurality of work packets. One or more requests are transmitted for the plurality of work packets that correspond to the plurality of hash value pieces. In response to receiving…

Semiconductor chip having stepped conductive pillars

Granted: April 9, 2024
Patent Number: 11955447
In an implementation, a semiconductor chip includes a device layer, an interconnect layer fabricated on the device layer, the interconnect layer including a conductive pad, and a conductive pillar coupled to the conductive pad. The conductive pillar includes at least a first portion having a first width and a second portion having a second width, the first portion being disposed between the second portion and the conductive pad, wherein the first width of the first portion is greater…

Variable width bounding volume hierarchy nodes

Granted: April 9, 2024
Patent Number: 11954788
A technique for performing ray tracing operations is provided. The technique includes processing small bounding box nodes in a box intersection test circuit to generate intersection test results for the small bounding box nodes; and processing large bounding box nodes in the box intersection test circuit to generate intersection test results for the large bounding box nodes.

Hybrid render with preferred primitive batch binning and sorting

Granted: April 9, 2024
Patent Number: 11954782
A method, system, and non-transitory computer readable storage medium for rasterizing primitives are disclosed. The method, system, and non-transitory computer readable storage medium includes: generating a primitive batch from a sequence of one or more primitives, wherein the primitive batch includes primitives sorted into one or more row groups based on which row of a plurality of rows each primitive intersects; and processing each row group, the processing for each row group…

Method and apparatus for implementing a rasterizer in GPU operations

Granted: April 9, 2024
Patent Number: 11954757
An apparatus, such as a graphical processing unit (GPU), includes one or more processors configured to determine a plurality of first locality information of a received wave at a processing unit and to select a first processing element of a plurality of processing elements, the first processing unit having a plurality of second locality information from a previous wave that matches the plurality of first locality information to execute the received wave.

Method and apparatus for training memory

Granted: April 2, 2024
Patent Number: 11947833
A method and apparatus for training data in a computer system includes reading data stored in a first memory address in a memory and writing it to a buffer. Training data is generated for transmission to the first memory address. The data is transmitted to the first memory address. Information relating to the training data is read from the first memory address and the stored data is read from the buffer and written to the memory area where the training data was transmitted.

Throttling hull shaders based on tessellation factors in a graphics pipeline

Granted: April 2, 2024
Patent Number: 11948251
A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive…

Redundancy method and apparatus for shader column repair

Granted: April 2, 2024
Patent Number: 11948223
Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit…

Machine learning inference engine scalability

Granted: April 2, 2024
Patent Number: 11948073
Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme…

Gang scheduling for low-latency task synchronization

Granted: April 2, 2024
Patent Number: 11948000
Systems, apparatuses, and methods for performing command buffer gang submission are disclosed. A system includes at least first and second processors and a memory. The first processor (e.g., CPU) generates a command buffer and stores the command buffer in the memory. A mechanism is implemented where a granularity of work provided to the second processor (e.g., GPU) is increased which, in turn, increases the opportunities for parallel work. In gang submission mode, the user-mode driver…

Enabling accelerated processing units to perform dataflow execution

Granted: April 2, 2024
Patent Number: 11947487
Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.

Cross-chiplet performance data streaming

Granted: April 2, 2024
Patent Number: 11947476
Methods and systems are disclosed for cross-chiplet performance data streaming. Techniques disclosed include accumulating, by a subservient chiplet, event data associated with an event indicative of a performance aspect of the subservient chiplet; sending, by the subservient chiplet, the event data over a chiplet bus to a master chiplet; and adding, by the master chiplet, the received event data to an event record, the event record containing previously received, from the subservient…

Duplicated registers in chiplet processing units

Granted: April 2, 2024
Patent Number: 11947473
Systems, apparatuses, and methods for implementing duplicated registers for access by initiators across multiple semiconductor dies are disclosed. A system includes multiple initiators on multiple semiconductor dies of a chiplet processor. One of the semiconductor dies is the master die, and this master die has copies of registers which can be accessed by the multiple initiators on the multiple semiconductor dies. When a given initiator on a given secondary die generates a register…

Weak cache line invalidation requests for speculatively executing instructions

Granted: April 2, 2024
Patent Number: 11947456
Techniques for invalidating cache lines are provided. The techniques include issuing, to a first level of a memory hierarchy, a weak exclusive read request for a speculatively executing store instruction; determining whether to invalidate one or more cache lines associated with the store instruction in one or more memories; and issuing the weak invalidation request to additional levels of the memory hierarchy.

Suppressing cache line modification

Granted: April 2, 2024
Patent Number: 11947455
Disclosed is a system and method for use in a cache for suppressing modification of cache line. The system and method includes a processor and a memory operating cooperatively with a cache controller. The memory includes a coherence directory stored within a cache created to track at least one cache line in the cache via the cache controller. The processor instructs a cache controller to store a first data in a cache line in the cache. The cache controller tags the cache line based on…

Separate clocking for components of a graphics processing unit

Granted: April 2, 2024
Patent Number: 11947380
Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal…

Droop detection and control of digital frequency-locked loop

Granted: March 26, 2024
Patent Number: 11942953
A apparatus includes a reference signal generator, a droop detection circuit, a digital frequency-locked loop (DFLL), and a DFLL control circuit. The reference signal generator that receives a digital value and produces a pulse-density modulated signal based on the digital value. The droop detection circuit converts the pulse-density modulated signal to an analog signal, compares the analog signal to a monitored supply voltage, and responsive to detecting a droop of the monitored supply…

Dynamic dispatch for workgroup distribution

Granted: March 26, 2024
Patent Number: 11941723
Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines by a command processor of a graphics processing unit (GPU). Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units. Each workgroup of the plurality of workgroups is dynamically assigned…

Probe filter retention based low power state

Granted: March 26, 2024
Patent Number: 11940858
A data fabric routes requests between the plurality of requestors and the plurality of responders. The data fabric includes a crossbar router, a coherent slave controller coupled to the crossbar router, and a probe filter coupled to the coherent slave controller and tracking the state of cached lines of memory. Power state control circuitry operates, responsive to detecting any of a plurality of designated conditions, to cause the probe filter to enter a retention low power state in…

Process isolation for a processor-in-memory (“PIM”) device

Granted: March 19, 2024
Patent Number: 11934698
Process isolation for a PIM device through exclusive locking includes receiving, from a process, a call requesting ownership of a PIM device. The request includes one or more PIM configuration parameters. The exclusive locking technique also includes granting the process ownership of the PIM device responsive to determining that ownership is available. The PIM device is configured according to the PIM configuration parameters.