AMD Patent Grants

Asynchronous submission of commands

Granted: April 25, 2017
Patent Number: 9632848
A system and method for allocating commands in processing is disclosed. The system and method includes an application running on a computer system that provides commands to be executed on one of a plurality of processors capable of executing the commands, the commands provided through an application programming interface, a device driver that buffers the streamed commands and converts the streamed commands into a format used by a GPU, and an operating system that builds a command buffer…

Integrated differential clock gater

Granted: April 18, 2017
Patent Number: 9625938
A technique implements differential digital logic circuits with a differential clock distribution network using standard cell differential clock gater circuits to reduce area, delay, power consumption in integrated circuits. An apparatus includes a first terminal configured to receive a clock signal, a second terminal configured to receive a complementary clock signal, and a third terminal configured to receive a clock control signal. The apparatus includes a latch circuit configured to…

Method and apparatus for floating point register caching

Granted: April 18, 2017
Patent Number: 9626190
The present invention provides a method and apparatus for floating-point register caching. One embodiment of the method includes mapping a first set of architected registers defined by a first instruction set to a memory outside of a plurality of physical registers. The plurality of physical registers are configured to map to the first set, a second set of architected registers defined by a second construction set, and a set of rename registers. This embodiment of the method also…

Data error correction device and methods thereof

Granted: April 18, 2017
Patent Number: 9626243
A method and device for error detection includes performing error detection for each data word received in a burst access to a memory. When no error is detected, the data words are written to a cache and indicated as valid data. In response to detecting an error in a data word, the error is corrected and the corrected data written to the cache without indicating the data as valid. In addition, the location of the detected error, indicating the data symbol associated with the error, is…

Apparatus and method for hash table access

Granted: April 18, 2017
Patent Number: 9626428
A system and method for accessing a hash table are provided. A hash table includes buckets where each bucket includes multiple chains. When a single instruction multiple data (SIMD) processor receives a group of threads configured to execute a key look-up instruction that accesses an element in the hash table, the threads executing on the SIMD processor identify a bucket that stores a key in the key look-up instruction. Once identified, the threads in the group traverse the multiple…

Implicit texture map parameterization for GPU rendering

Granted: April 18, 2017
Patent Number: 9626789
Embodiments are described for a method for processing textures for a mesh comprising quadrilateral polygons for real-time rendering of an object or model in a graphics processing unit (GPU), comprising associating an independent texture map with each face of the mesh to produce a plurality of face textures, packing the plurality of face textures into a single texture atlas, wherein the atlas is divided into a plurality of blocks based on a resolution of the face textures, adding a border…

Hardware and runtime coordinated load balancing for parallel applications

Granted: April 11, 2017
Patent Number: 9619290
A method of balancing execution rates for a plurality of parallel program loops being executed concurrently by a processor may include estimating a completion time for each program loop of the plurality of program loops, determining a difference between the estimated completion time of a first program loop of the plurality of program loops and the estimated completion time of a second program loop of the plurality of program loops, and decreasing the difference by adjusting an execution…

SIMD processing unit with local data share and access to a global data share of a GPU

Granted: April 11, 2017
Patent Number: 9619428
A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads.

Propagation simulation buffer for clock domain crossing

Granted: April 11, 2017
Patent Number: 9621143
Techniques are disclosed relating to detecting and minimizing timing problems created by clock domain crossing (CDC) in integrated circuits. In various embodiments, one or more timing parameters are associated with a path that crosses between clock domains in an integrated circuit, where the one or more timing parameters specify a propagation delay for the path. In one embodiment, the timing parameters may be distributed to different design stages using a configuration file. In some…

Memory management in graphics and compute application programming interfaces

Granted: April 4, 2017
Patent Number: 9612884
Methods are provided for creating objects in a way that permits an API client to explicitly participate in memory management for an object created using the API. Methods for managing data object memory include requesting memory requirements for an object using an API and expressly allocating a memory location for the object based on the memory requirements. Methods are also provided for cloning objects such that a state of the object remains unchanged from the original object to the…

Scan flip-flop circuit with dedicated clocks

Granted: March 28, 2017
Patent Number: 9606177
In one form, a scan flip-flop includes a clock gating cell and a dedicated clock flip-flop. The clock gating cell provides an input clock input signal as a scan clock signal when a scan shift enable signal is active, and provides the input clock signal as a data clock signal when the scan shift enable signal is inactive. The dedicated clock flip-flop stores a data input signal and provides the data input signal, so stored, as a data output signal in response to transitions of the data…

Dependence-based replay suppression

Granted: March 28, 2017
Patent Number: 9606806
A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a…

Generalized control registers

Granted: March 28, 2017
Patent Number: 9606936
Methods, systems, and computer readable media generalize control registers in the context of memory address translations for I/O devices. A method includes maintaining a table including a plurality of concurrently available control register base pointers each associated with a corresponding input/output (I/O) device, associating each control register base pointer with a first translation from a guest virtual address (GVA) to a guest physical address (GPA) and a second translation from…

Scheduling of data migration

Granted: March 14, 2017
Patent Number: 9594521
In one form, scheduling data migration comprises determining whether the data is likely to be used by an input/output (I/O) device, the data being at a location remote to the I/O device; and scheduling the data for migration from the remote location to a location local to the I/O device in response to determining that the data is likely to be used by the I/O device.

Method and apparatus of adaptive application performance

Granted: March 14, 2017
Patent Number: 9594588
A method and apparatus of adaptive application performance includes a determination of at least one criteria for implementing adaptive application performance measures. Based upon the determination, adaptive application performance measures are implemented.

Media hardware resource allocation

Granted: March 14, 2017
Patent Number: 9594594
Apparatus, computer readable medium, and method of allocating media resources, the method including determining a media resources allocation table based on one or more media hardware resources and predetermined benchmarks of media hardware resources for performing media operations; in response to receiving a request for media resources from a first application, comparing the requested media resources with the media resources allocation table; and if the comparison indicates that the…

Efficient processor load balancing using predication flags

Granted: March 14, 2017
Patent Number: 9594595
A system and methods embodying some aspects of the present embodiments for efficient load balancing using predication flags are provided. The load balancing system includes a first processing unit, a second processing unit, and a shared queue. The first processing unit is in communication with a first queue. The second processing unit is in communication with a second queue. The first and second queues are each configured to hold a packet. The shared queue is configured to maintain a…

Voltage droop mitigation in 3D chip system

Granted: March 14, 2017
Patent Number: 9595508
The present invention relates to a multichip system and a method for scheduling threads in 3D stacked chip. The multichip system comprises a plurality of dies stacked vertically and electrically coupled together; each of the plurality of dies comprising one or more cores, each of the plurality of dies further comprising: at least one voltage violation sensing unit, the at least one voltage violation sensing unit being connected with the one or more cores of each die, the at least one…

Flexible page sizes for virtual memory

Granted: March 7, 2017
Patent Number: 9588902
A method for translating a virtual memory address into a physical memory address includes parsing the virtual memory address into a page directory entry offset, a page table entry offset, and an access offset. The page directory entry offset is combined with a virtual memory base address to locate a page directory entry in a page directory block, wherein the page directory entry includes a native page table size field and a page table block base address. The page table entry offset and…

Register file management for operations using a single physical register for both source and result

Granted: February 28, 2017
Patent Number: 9582286
A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute…