Inline suspension of an accelerated processing unit
Granted: August 6, 2024
Patent Number:
12056787
Methods and systems are disclosed for inline suspension of an accelerated processing unit (APU). Techniques include receiving a packet, including a mode of operation and commands to be executed by the APU; suspending execution of commands received in previous packets when the mode of operation is a suspension initiation mode; and executing, by the APU, the commands in the received packet. The execution of the suspended commands is restored when the mode of operation in a subsequently…
Hierarchical asymmetric core attribute detection
Granted: August 6, 2024
Patent Number:
12056522
Within a processing system, thread count asymmetries manifest when one or more cores of a processing device are disabled. To determine such thread count asymmetries, discovery operations are performed to determine thread count asymmetries for one or more hierarchy levels of a processing device based on a number of threads per enumerated instance within the hierarchy level. In response to the determining a thread count asymmetry, one thread identifier for each enumerated instance within…
Combined codec buffer management
Granted: July 30, 2024
Patent Number:
12052472
Techniques are provided herein for processing video data. The techniques include identifying one or more input factors including one or more of signal quality factors, video content complexity factors, and hardware buffering factors for one or more of a video encoding system and a video playback system; evaluating the one or more input factors to determine adjustments to apply to one or both of the video encoding system and the video playback system; and applying the determine…
Dynamic fine grain link control
Granted: July 30, 2024
Patent Number:
12052153
Systems, apparatuses, and methods for enabling localized control of link states in a computing system are disclosed. A computing system includes at least a host processor, a communication fabric, one or more devices, one or more links, and a local link controller to monitor the one or more links. In various implementations, the local link controller detects and controls states of a link without requiring communication with, or intervention by, the host processor. In various…
Systems and methods for distributed rendering using two-level binning
Granted: July 30, 2024
Patent Number:
12051154
Systems and methods for distributed rendering using two-level binning include processing primitives of a frame to be rendered at a first graphics processing unit (GPU) chiplet in a set of GPU chiplets to generate visibility information of primitives for each coarse bin and providing the visibility information to the other GPU chiplets in the set of GPU chiplets. Each coarse bin is then assigned to one of the GPU chiplets of the set of GPU chiplets and rendered at the assigned GPU chiplet…
Fully utilized hardware in a multi-tenancy graphics processing unit
Granted: July 30, 2024
Patent Number:
12051144
An apparatus such as a graphics processing unit (GPU) includes a set of shader engines and a set of front end (FE) circuits. Subsets of the set of FE circuits schedule geometry workloads for subsets of the set of shader engines based on a mapping. The apparatus also includes a set of physical paths that convey information from the set of FE circuits to a memory via the set of shader engines. Subsets of the set of physical paths are allocated to the subsets of the set of FE circuits and…
Array of pointers prefetching
Granted: July 30, 2024
Patent Number:
12050916
Array of pointers prefetching is described. In accordance with described techniques, a pointer target instruction is detected by identifying that a destination location of a load instruction is used in an address compute for a memory operation and the load instruction is included in a sequence of load instructions having addresses separated by a step size. An instruction for fetching data of a future load instruction is injected in an instruction stream of a processor. The data of the…
Data compression and decompression for processing in memory
Granted: July 30, 2024
Patent Number:
12050531
In accordance with the described techniques for data compression and decompression for processing in memory, a page address is received by a processing in memory component that maps to a first location in memory where data of a page is maintained. The data of the page is compressed by the processing in memory component. Further, compressed data of the page is written by the processing in memory component to a compressed block device responsive to the compressed data satisfying one or…
Enhanced low-priority arbitration
Granted: July 23, 2024
Patent Number:
12045182
A computing system may implement a low priority arbitration interrupt method that includes receiving a message signaled interrupt (MSI) message from an input output hub (I/O hub) transmitted over an interconnect fabric, selecting a processor to interrupt from a cluster of processors based on arbitration parameters, and communicating an interrupt service routine to the selected processor, wherein the I/O hub and the cluster of processors are located within a common domain.
Secure computer vision processing
Granted: July 23, 2024
Patent Number:
12045362
A computer vision processor in an image cluster defines a fenced memory region (FMR) that controls access to image data stored in a first portion of a trusted memory region (TMR). The computer vision processor receives FMR requests from an application implemented in a processing cluster. The FMR requests are to access the image data in the first portion of the TMR. The computer vision processor selectively allows the requesting application to access the image data. In some cases, the…
Hardware configuration selection using machine learning model
Granted: July 23, 2024
Patent Number:
12045169
Techniques for identifying a hardware configuration for operation are disclosed. The techniques include applying feature measurements to a trained model; obtaining output values from the trained model, the output values corresponding to different hardware configurations; and operating according to the output values, wherein the output values include one of a certainty score, a ranking, or a regression value.
Dual purpose millimeter wave frequency band transmitter
Granted: July 23, 2024
Patent Number:
12044774
Systems, apparatuses, and methods for implementing a dual-purpose millimeter-wave frequency band transmitter are disclosed. A system includes a dual-purpose transmitter sending a video stream over a wireless link to a receiver. In some embodiments, the video stream is generated as part of an augmented reality (AR) or virtual reality (VR) application. The transmitter operates in a first mode to scan and map an environment of the transmitter and receiver. The transmitter generates radio…
Adaptive batch reuse on deep memories
Granted: July 16, 2024
Patent Number:
12039450
A method of adaptive batch reuse includes prefetching, from a CPU to a GPU, a first plurality of mini-batches comprising a subset of a training dataset. The GPU trains the neural network for the current epoch by reusing, without discard, the first plurality of mini-batches in training the neural network for the current epoch based on a reuse count value. The GPU also runs a validation set to identify a validation error for the current epoch. If the validation error for the current epoch…
Loader and runtime operations for heterogeneous code objects
Granted: July 16, 2024
Patent Number:
12039344
Described herein are techniques for executing a heterogeneous code object executable. According to the techniques, a loader identifies a first memory appropriate for loading a first architecture-specific portion of the heterogeneous code object executable, wherein the first architecture specific portion includes instructions for a first architecture, identifies a second memory appropriate for loading a second architecture-specific portion of the heterogeneous code object executable,…
Processor with multiple fetch and decode pipelines
Granted: July 16, 2024
Patent Number:
12039337
A processor employs a plurality of fetch and decode pipelines by dividing an instruction stream into instruction blocks with identified boundaries. The processor includes a branch predictor that generates branch predictions. Each branch prediction corresponds to a branch instruction and includes a prediction that the corresponding branch is to be taken or not taken. In addition, each branch prediction identifies both an end of the current branch prediction window and the start of another…
Memory controller with a plurality of command sub-queues and corresponding arbiters
Granted: July 16, 2024
Patent Number:
12038856
A memory controller includes a memory channel controller that uses multiple groups of command queue and arbiter pairs. Each arbiter is coupled to a respective command queue to select memory access commands from each command queue according to predetermined criteria. Each arbiter selects from among the memory access requests in each command queue independently based on the predetermined criteria and sends selected memory access requests to a selector that serves as a second level arbiter…
A/D bit storage, processing, and modes
Granted: July 16, 2024
Patent Number:
12038847
A/D bit storage, processing, and mode management techniques through use of a dense A/D bit representation are described. In one example, a memory management unit employs an A/D bit representation generation module to generate the dense A/D bit representation. In an implementation, the A/D bit representation is stored adjacent to existing page table structures of the multilevel page table hierarchy. In another example, memory management unit supports use of modes as part of A/D bit…
User configurable hardware settings for overclocking
Granted: July 16, 2024
Patent Number:
12038779
User configurable hardware settings for overclocking is described. In accordance with the described techniques, user input to adjust hardware settings for operating a processing unit in an overclocking mode is received. The user input, for example, adjusts at least one of a voltage droop threshold or a frequency adjustment of the clock rate. A voltage droop is detected while operating the processing unit in the overclocking mode. Responsive to detecting the voltage droop, a clock rate of…
Partial sorting for coherency recovery
Granted: July 9, 2024
Patent Number:
12032967
Devices and methods for partial sorting for coherence recovery are provided. The partial sorting is efficiently executed by utilizing existing hardware along the memory path (e.g., memory local to the compute unit). The devices include an accelerated processing device which comprises memory and a processor. The processor is, for example, a compute unit of a GPU which comprises a plurality of SIMD units and is configured to determine, for data entries each comprising a plurality of bits,…
Dead surface invalidation
Granted: July 9, 2024
Patent Number:
12033239
Systems, apparatuses, and methods for performing dead surface invalidation are disclosed. An application sends draw call commands to a graphics processing unit (GPU) via a driver, with the draw call commands rendering to surfaces. After it is determined that a given surface will no longer be accessed by subsequent draw calls, the application sends a surface invalidation command for the given surface to a command processor of the GPU. After the command processor receives the surface…