AMD Patent Applications

POST-DEPTH VISIBILITY COLLECTION WITH TWO LEVEL BINNING

Granted: October 6, 2022
Application Number: 20220319091
A method and apparatus of tile rendering of an image for a display in a computer system includes receiving the image in a graphics pipeline of the computer system, the image comprising one or more three dimensional (3D) objects. The image is divided into one or more tiles. A depth test is performed on the one or more tiles, and based upon the depth test, visibility information of the one or more tiles is binned.

MULTI-ACCELERATOR COMPUTE DISPATCH

Granted: October 6, 2022
Application Number: 20220319089
Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such…

REAL TIME MACHINE LEARNING-BASED PRIVACY FILTER FOR REMOVING REFLECTIVE FEATURES FROM IMAGES AND VIDEO

Granted: October 6, 2022
Application Number: 20220318954
A method for removing reflections from images is disclosed. The method includes identifying one or more segments of an image, the one or more segments including a reflection; identifying one or more features of the one or more segments; removing the one or more features from the segments to generate one or more sanitized segments; and combining the one or more sanitized segments with the image to generate a sanitized image.

MEMORY CONTROLLER POWER STATES

Granted: October 6, 2022
Application Number: 20220318161
A memory controller includes a command queue and an arbiter operating in a first voltage domain, and a physical layer interface (PHY) operating in a second voltage domain. The memory controller includes isolation cells operable to isolate the PHY from the first voltage domain. A local power state controller, in response to a first power state command, provides configuration and state data for storage in an on-chip RAM memory, causes the memory controller to enter a powered-down state,…

WAVEFRONT SELECTION AND EXECUTION

Granted: October 6, 2022
Application Number: 20220318021
Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including…

SYNCHRONIZATION FREE CROSS PASS BINNING THROUGH SUBPASS INTERLEAVING

Granted: September 29, 2022
Application Number: 20220309729
A method of tiled rendering is provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles and interleaving execution of same subpasses of multiple tiles of the frame. Interleaving execution of same subpasses of multiple tiles comprises executing a previously ordered first subpass of a second tile between execution of the previously ordered first subpass of a first tile and execution of a…

DYNAMICALLY RECONFIGURABLE REGISTER FILE

Granted: September 29, 2022
Application Number: 20220309606
Techniques for managing register allocation are provided. The techniques include detecting a first request to allocate first registers for a first wavefront; first determining, based on allocation information, that allocating the first registers to the first wavefront would result in a condition in which a deadlock is possible; in response to the first determining, refraining from allocating the first registers to the first wavefront; detecting a second request to allocate second…

APPROXIMATION OF MATRICES FOR MATRIX MULTIPLY OPERATIONS

Granted: September 29, 2022
Application Number: 20220309126
A processing device is provided which comprises memory configured to store data and a processor configured to receive a portion of data of a first matrix comprising a first plurality of elements and receive a portion of data of a second matrix comprising a second plurality of elements. The processor is also configured to determine values for a third matrix by dropping a number of products from products of pairs of elements of the first and second matrices based on approximating the…

DATA COMPRESSOR FOR APPROXIMATION OF MATRICES FOR MATRIX MULTIPLY OPERATIONS

Granted: September 29, 2022
Application Number: 20220309125
A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept…

OVERLAPPED CURVE MAPPING FOR HISTOGRAM-BASED LOCAL TONE AND LOCAL CONTRAST

Granted: September 15, 2022
Application Number: 20220292653
Methods and apparatuses are disclosed herein for performing tone mapping and/or contrast enhancement. In some examples, a block mapping curve is low-pass filtered with block mapping curves of surrounding blocks to form a smoothed block mapping curve. In some examples, overlapped curve mapping of block mapping curves, including smoothed block mapping curves, is performed, including weighting, based on a pixel location, block mapping curves of a group of blocks to generate an interpolated…

DATA CACHE REGION PREFETCHER

Granted: September 8, 2022
Application Number: 20220283955
A method, system, and processing system for pre-fetching data is disclosed. The method, system, and processing system includes data cache region prefetch circuitry for detecting a first access by a first instruction at a first instruction address to a first memory portion, detecting a first non-sequential access pattern to a set of addresses in the first memory portion, and in response to a miss by a second instruction at the first instruction address, and in response to the…

METHODS AND DEVICES FOR TESTING MULTIPLE MEMORY CONFIGURATIONS

Granted: May 12, 2022
Application Number: 20220148669
Methods, devices, and systems for testing a number of combinations of memory in a computer system. A modular memory device is installed in a memory channel in communication with a processor. The modular memory device includes a number of memory storage devices. The number of memory storage devices include a number of pins. For each of a number of subsets of the number of memory storage devices, a subset of the number of memory storage devices is selected, each pin of a subset of the…

REDUCING BURN-IN FOR MONTE-CARLO SIMULATIONS VIA MACHINE LEARNING

Granted: May 12, 2022
Application Number: 20220147668
Techniques are disclosed for compressing data. The techniques include identifying, in data to be compressed, a first set of values, wherein the first set of values include a first number of two or more consecutive identical non-zero values; including, in compressed data, a first control value indicating the first number of non-zero values and a first data item corresponding to the consecutive identical non-zero values; identifying, in the data to be compressed, a second value having an…

ENHANCED DURABILITY FOR SYSTEMS ON CHIP (SOCS)

Granted: May 12, 2022
Application Number: 20220147455
A system-on-chip with runtime global push to persistence includes a data processor having a cache, an external memory interface, and a microsequencer. The external memory interface is coupled to the cache and is adapted to be coupled to an external memory. The cache provides data to the external memory interface for storage in the external memory. The microsequencer is coupled to the data processor. In response to a trigger signal, the microsequencer causes the cache to flush the data by…

SYNC POINT MECHANISM BETWEEN MASTER AND SLAVE NODES

Granted: May 12, 2022
Application Number: 20220147366
In a system with a master processor and slave processors, sync points are used in boot instructions. While executing the boot instructions, the slave processor determines whether the sync point is enabled. In response to determining the sync point is enabled, the slave processor pauses execution of the boot instructions, waits for commands from the master processor, receives commands from the master processor, executes the received commands until a release command is received, and then…

TOP PALETTE COLORS SELECTION USING SORTING FOR PALETTE MODE IN VIDEO ENCODING

Granted: May 5, 2022
Application Number: 20220141472
An encoding method is provided which includes receiving a plurality of images, obtaining values of elements in a portion of the images, sorting the elements according to different values of the elements, sorting the elements according to a number of occurrences of the different values and encoding the elements using a subset of the different values having corresponding numbers of occurrences that are higher than corresponding numbers of occurrences of other values. Examples also include…

REFRESH MANAGEMENT FOR MEMORY

Granted: April 21, 2022
Application Number: 20220122652
A memory controller interfaces with a random access memory over a memory channel. A refresh control circuit monitors an activate counter which counts a rolling number of activate commands sent over the memory channel to a memory region of the memory. In response to the activate counter being above an intermediate management threshold value, the refresh control circuit only issue a refresh management (RFM) command if there is no REF command currently held at the refresh command circuit…

SYSTEM PERFORMANCE MANAGEMENT USING PRIORITIZED COMPUTE UNITS

Granted: April 14, 2022
Application Number: 20220114097
Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially…

TECHNIQUES FOR HANDLING CACHE COHERENCY TRAFFIC FOR CONTENDED SEMAPHORES

Granted: March 31, 2022
Application Number: 20220100662
The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an…

PERSISTENT WEIGHTS IN TRAINING

Granted: March 31, 2022
Application Number: 20220101110
Techniques are disclosed for performing machine learning operations. The techniques include fetching weights for a first layer in a first format; performing matrix multiplication of the weights fetched in the first format with values provided by a prior layer in a forwards training pass; fetching the weights for the first layer in a second format different from the first format; and performing matrix multiplication for a backwards pass, the matrix multiplication including multiplication…