Automatic mirrored ROM
Granted: May 14, 2024
Patent Number:
11984175
The disclosed method may include detecting, by a control circuit coupled to a first read only memory (ROM) device and a second ROM device, a failure of a first output signal from the first ROM device to a common output. The first ROM device is connected to the common output and the second ROM device is disconnected from the common output. The method also includes switching, by the control circuit in response to detecting the failure, the common output from the first ROM device to the…
Auto generation and tuning tool for convolution kernels
Granted: May 14, 2024
Patent Number:
11983624
Systems, apparatuses, and methods for implementing an auto generation and tuning tool for convolution kernels are disclosed. A processor executes multiple tuning runs of a given layer of a neural network while using a different set of operating parameter values for each tuning run. The operating parameters can include one or more of input dataset fetch group size, output channel group size, and other parameters. The processor captures performance data for each tuning run and then after…
Method for matrix data broadcast in parallel processing
Granted: May 14, 2024
Patent Number:
11983560
Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the…
Method and apparatus of data compression
Granted: May 7, 2024
Patent Number:
11978234
A method and apparatus for processing color data includes storing fragment pointer and color data together in a color buffer. A delta color compression (DCC) key indicating the color data to fetch for processing is stored, and the fragment pointer and color data is fetched based upon the read DCC key for decompression.
Gang scheduling with an onboard graphics processing unit and user-based queues
Granted: May 7, 2024
Patent Number:
11977933
A processing unit such as a graphics processing unit (GPU) includes a set of queues that stores command buffers prior to execution in a corresponding plurality of pipelines. The processing unit also implements a kernel mode driver that allocates a first subset of the set of queues to a first application in response to receiving registration requests from the first application. The processing unit further includes a scheduler that schedules command buffers in the first subset of the set…
Stateful microcode branching
Granted: May 7, 2024
Patent Number:
11977890
Stateful microbranch instructions, including: generating, based on an instruction, a first one or more microinstructions including a stateful microbranch instruction, wherein the stateful microbranch instruction includes: an address of a next instruction after the instruction; a branch target address; one or more microcode attributes; and executing the first one or more microinstructions.
Approach for enabling concurrent execution of host memory commands and near-memory processing commands
Granted: May 7, 2024
Patent Number:
11977782
An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without…
Real time profile switching for memory overclocking
Granted: May 7, 2024
Patent Number:
11977757
Profile switching for memory overclocking is described. In accordance with the described techniques, a memory is operated according to a first memory profile. During operation of the memory according to the first memory profile, a request is received to operate the memory according to a second memory profile. Responsive to the request, operation of the memory is switched to operate according to the second memory profile without rebooting. In one or more implementations, at least one of…
End user sensitivity profiling for efficiency and performance management
Granted: April 30, 2024
Patent Number:
11972271
A processing device is provided which comprises memory and a processor, in communication with the memory. The processor is configured to acquire information indicating a sensory perception of a user, determine settings for one or more parameters used to control operation of the device based on the information indicating the sensory perception of the user and control the operation of the device by tuning the one or more parameters according to the determined settings.
Hybrid binning
Granted: April 30, 2024
Patent Number:
11972518
A processing device and a method of tiled rendering of an image for display is provided. The processing device includes memory and a processor. The processor is configured to receive the image comprising one or more three dimensional (3D) objects, divide the image into tiles, execute coarse level tiling for the tiles of the image and execute fine level tiling for the tiles of the image. The processing device also includes same fixed function hardware used to execute the coarse level…
Hardware device for enforcing atomicity for memory operations
Granted: April 30, 2024
Patent Number:
11972261
A system includes a hardware compare and swap (CAS) module communicatively coupled to a bus, the CAS module to perform an atomic operation in response to a first request from a first request agent for the atomic operation to be performed on a data value that is shared among a plurality of request agents and obtain a first result value. The atomic operation includes initiating a CAS command via the bus. The CAS module performs the atomic operation in response to a second request from a…
Methods and apparatus for synchronizing data transfers across clock domains using heads-up indications
Granted: April 23, 2024
Patent Number:
11967960
Methods and apparatus for synchronizing data transfers across clock domains for using heads-up indications. An integrated circuit includes a first-in first-out buffer (FIFO); a memory controller configured to operate in a first clock domain and coupled to the FIFO, the first clock domain associated with a first clock signal; a data fabric configured to operate in a second clock domain and coupled to the FIFO, the second clock domain associated with a second clock signal, a second…
Gaming super resolution
Granted: April 23, 2024
Patent Number:
11967043
A processing device is provided which includes memory and a processor. The processor is configured to receive an input image having a first resolution, generate at least one linear down-sampled version of the input image via a linear upscaling network, generate at least one non-linear down-sampled version of the input image via a non-linear upscaling network, extract a first feature map from the at least one linear down-sampled version of the input image, and extract a second feature map…
Selecting between basic and global persistent flush modes
Granted: April 23, 2024
Patent Number:
11966339
Selecting between basic and global persistent flush modes is described. In accordance with the described techniques, a system includes a data fabric in electronic communication with at least one cache and a controller configured to select between operating in a global persistent flush mode and a basic persistent flush mode based on an available flushing latency of the system, control the at least one cache to flush dirty data to the data fabric in response to a flush event trigger while…
Near-memory determination of registers
Granted: April 23, 2024
Patent Number:
11966328
A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local…
Devices, systems, and methods for detecting and mitigating silent data corruptions via adaptive voltage-frequency scaling
Granted: April 23, 2024
Patent Number:
11966283
An exemplary computing device includes a plurality of circuits and/or a plurality of in-situ monitors configured to generate outputs that indicate one or more operating conditions of the circuits. The computing device also includes a system management unit configured to detect a potentially faulty voltage-to-frequency ratio implemented by one of the circuits based at least in part on one or more of the outputs. The system management unit is also configured to modify the potentially…
Performance management during power supply voltage droop
Granted: April 16, 2024
Patent Number:
11960340
A method for controlling a data processing system includes detecting a droop in a power supply voltage of a functional circuit of the data processing system greater than a programmable droop threshold. An operation of the data processing system is throttled according to a programmable step size, a programmable assertion time, and a programmable de-assertion time in response to detecting the droop.
Skew matching in a die-to-die interface
Granted: April 16, 2024
Patent Number:
11960435
A semiconductor package for skew matching in a die-to-die interface, including: a first die; a second die aligned with the first die such that each connection point of a first plurality of connection points of the first die is substantially equidistant to a corresponding connection point of a second plurality of connection points of the second die; and a plurality of connection paths of a substantially same length, wherein each connection path of the plurality of connection paths couples…
Method and apparatus for reducing the latency of long latency memory requests
Granted: April 16, 2024
Patent Number:
11960404
Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as…
Relaxed invalidation for cache coherence
Granted: April 16, 2024
Patent Number:
11960399
Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of…