Pattern-based cache block compression
Granted: June 4, 2024
Patent Number:
12001237
Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is…
Compensation for clock frequency modulation
Granted: May 28, 2024
Patent Number:
11996848
The disclosed computer-implemented method includes providing, by a reference clock circuit, a clock signal for a clock-triggered element triggered by the clock signal and modulating, by a frequency modulation circuit, a frequency of the clock signal. The method also includes inserting, by a phase compensation circuit, a phase compensation offset to the modulated clock signal in a manner that compensates for a phase error produced by modulating the frequency of the clock signal. Various…
Adaptable allocation of SRAM based on power
Granted: May 28, 2024
Patent Number:
11996166
A technique for processing computer instructions is provided. The technique includes obtaining information for an instruction state memory entry for an instruction; identifying, for the instruction state memory entry, a slot in an instruction state memory having selectably powered rows and blocks, based on clustering criteria; and placing the instruction state memory entry into the identified slot.
DMA engines configured to perform first portion data transfer commands with a first DMA engine and second portion data transfer commands with second DMA engine
Granted: May 28, 2024
Patent Number:
11995351
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command.…
Sparse matrix-vector multiplication
Granted: May 28, 2024
Patent Number:
11995149
A processing system includes a first set and a second set of general-purpose registers (GPRs) and memory access circuitry that fetches nonzero values of a sparse matrix into consecutive slots in the first set. The memory access circuitry also fetches values of an expanded matrix into consecutive slots in the second set of GPRs. The expanded matrix is formed based on values of a vector and locations of the nonzero values in the sparse matrix. The processing system also includes a set of…
Memory controller with hybrid DRAM/persistent memory channel arbitration
Granted: May 28, 2024
Patent Number:
11995008
A memory controller includes a command queue having an input for receiving memory access commands for a memory channel, and a number of entries for holding a predetermined number of memory access commands, and an arbiter that selects memory commands from the command queue for dispatch to one of a persistent memory and a DRAM memory coupled to the memory channel. The arbiter includes a first-tier sub-arbiter circuit coupled to the command queue for selecting candidate commands from among…
Systems and methods for generating remedy recommendations for power and performance issues within semiconductor software and hardware
Granted: May 28, 2024
Patent Number:
11994939
The disclosed computer-implemented method for generating remedy recommendations for power and performance issues within semiconductor software and hardware. For example, the disclosed systems and methods can apply a rule-based model to telemetry data to generate rule-based root-cause outputs as well as telemetry-based unknown outputs. The disclosed systems and methods can further apply a root-cause machine learning model to the telemetry-based unknown outputs to analyze deep and complex…
Centralized interrupt handling for chiplet processing units
Granted: May 21, 2024
Patent Number:
11989144
Systems, apparatuses, and methods for implementing a centralized interrupt controller to aggregate interrupts generated across multiple semiconductor dies are disclosed. A system includes multiple interrupt sources on multiple semiconductor dies. A centralized interrupt controller on one of the semiconductor dies receives and aggregates interrupts from the multiple interrupt sources on the multiple semiconductor dies. This facilitates a single transmission point for forwarding the…
Swizzle mode detection
Granted: May 21, 2024
Patent Number:
11989918
Systems, apparatuses, and methods for converting pixel data to a custom swizzle mode are disclosed. A graphics engine receives data in a pre-defined swizzle mode. The graphics engine determines a custom swizzle mode for the data that has directionality aligned to the data itself to further optimize deltas that are used for compressing the data. The graphics engine groups incoming data into group of two neighboring pixels in both the horizontal and vertical directions. The graphics engine…
Dynamically configurable overprovisioned microprocessor
Granted: May 21, 2024
Patent Number:
11989591
A dynamically configurable overprovisioned microprocessor optimally supports a variety of different compute application workloads and with the capability to tradeoff among compute performance, energy consumption, and clock frequency on a per-compute application basis, using general-purpose microprocessor designs. In some embodiments, the overprovisioned microprocessor comprises a physical compute resource and a dynamic configuration logic configured to: detect an activation-warranting…
Multi-chiplet clock delay compensation
Granted: May 21, 2024
Patent Number:
11989050
Methods and systems are disclosed for clock delay compensation in a multiple chiplet system. Techniques disclosed include distributing, by a clock generator, a clock signal across distribution trees of respective chiplets; measuring phases, by phase detectors, where each phase measurement is associated with a chiplet of the chiplets and is indicative of a propagation speed of the clock signal through the distribution tree of the chiplet. Then, for each chiplet, techniques are further…
Automatic mirrored ROM
Granted: May 14, 2024
Patent Number:
11984175
The disclosed method may include detecting, by a control circuit coupled to a first read only memory (ROM) device and a second ROM device, a failure of a first output signal from the first ROM device to a common output. The first ROM device is connected to the common output and the second ROM device is disconnected from the common output. The method also includes switching, by the control circuit in response to detecting the failure, the common output from the first ROM device to the…
Auto generation and tuning tool for convolution kernels
Granted: May 14, 2024
Patent Number:
11983624
Systems, apparatuses, and methods for implementing an auto generation and tuning tool for convolution kernels are disclosed. A processor executes multiple tuning runs of a given layer of a neural network while using a different set of operating parameter values for each tuning run. The operating parameters can include one or more of input dataset fetch group size, output channel group size, and other parameters. The processor captures performance data for each tuning run and then after…
Method for matrix data broadcast in parallel processing
Granted: May 14, 2024
Patent Number:
11983560
Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the…
Method and apparatus of data compression
Granted: May 7, 2024
Patent Number:
11978234
A method and apparatus for processing color data includes storing fragment pointer and color data together in a color buffer. A delta color compression (DCC) key indicating the color data to fetch for processing is stored, and the fragment pointer and color data is fetched based upon the read DCC key for decompression.
Gang scheduling with an onboard graphics processing unit and user-based queues
Granted: May 7, 2024
Patent Number:
11977933
A processing unit such as a graphics processing unit (GPU) includes a set of queues that stores command buffers prior to execution in a corresponding plurality of pipelines. The processing unit also implements a kernel mode driver that allocates a first subset of the set of queues to a first application in response to receiving registration requests from the first application. The processing unit further includes a scheduler that schedules command buffers in the first subset of the set…
Stateful microcode branching
Granted: May 7, 2024
Patent Number:
11977890
Stateful microbranch instructions, including: generating, based on an instruction, a first one or more microinstructions including a stateful microbranch instruction, wherein the stateful microbranch instruction includes: an address of a next instruction after the instruction; a branch target address; one or more microcode attributes; and executing the first one or more microinstructions.
Approach for enabling concurrent execution of host memory commands and near-memory processing commands
Granted: May 7, 2024
Patent Number:
11977782
An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without…
Real time profile switching for memory overclocking
Granted: May 7, 2024
Patent Number:
11977757
Profile switching for memory overclocking is described. In accordance with the described techniques, a memory is operated according to a first memory profile. During operation of the memory according to the first memory profile, a request is received to operate the memory according to a second memory profile. Responsive to the request, operation of the memory is switched to operate according to the second memory profile without rebooting. In one or more implementations, at least one of…
End user sensitivity profiling for efficiency and performance management
Granted: April 30, 2024
Patent Number:
11972271
A processing device is provided which comprises memory and a processor, in communication with the memory. The processor is configured to acquire information indicating a sensory perception of a user, determine settings for one or more parameters used to control operation of the device based on the information indicating the sensory perception of the user and control the operation of the device by tuning the one or more parameters according to the determined settings.