Optimizing partial writes to compressed blocks
Granted: December 17, 2024
Patent Number:
12169876
A processor for optimizing partial writes to compressed blocks is configured to identify that a write request targets less than an entirety of a compressed block of pixel data, identify, based on a compression key, a compressed segment of the compressed block of pixel data that includes a target of the write request, and decompress, responsive to the write request, only the identified compressed segment of the compressed block of pixel data.
Dynamic precision scaling at epoch granularity in neural networks
Granted: December 17, 2024
Patent Number:
12169782
A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets…
Graphics pipeline optimizations
Granted: December 17, 2024
Patent Number:
12169703
Systems, apparatuses, and methods for implementing graphics pipeline optimizations are disclosed. A user interface (UI) is generated to allow a user to analyze shaders and determine resource utilization on any of multiple different target graphic devices. The UI allows the user to manipulate the state associated with the target graphics device for a given graphics pipeline. After being edited by the user, the state of the graphics pipeline is converted into a textual representation and…
Reducing system power consumption when capturing data from a USB device
Granted: December 17, 2024
Patent Number:
12169430
Systems and methods are disclosed for reducing power consumed by capturing data from an I/O device. Techniques disclosed include receiving descriptors, by a controller of an I/O host of a system, including information associated with respective data chunks to be captured from an I/O device buffer of the I/O device. Techniques disclosed further include capturing, based on the descriptors, the data chunks. The capturing comprises pulling the data chunks from the I/O device buffer at a…
Compression metadata assisted computation
Granted: December 10, 2024
Patent Number:
12164924
A method includes, in response to receiving an instruction to perform a first operation on first data stored in a memory device, obtaining first compression metadata from the memory device based on an address for the first data, and reducing a number of operations in a set of operations based on the first operation and one or more matching addresses, the one or more matching addresses corresponding to second compression metadata matching the first compression metadata.
Multicast in the probe channel
Granted: December 10, 2024
Patent Number:
12167102
Systems, apparatuses, and methods for processing multi-cast messages are disclosed. A system includes at least one or more processing units, one or more memory controllers, and a communication fabric coupled to the processing unit(s) and the memory controller(s). The communication fabric includes a plurality of crossbars which connect various agents within the system. When a multi-cast message is received by a crossbar, the crossbar extracts a message type indicator and a recipient type…
3D semiconductor package with die-mounted voltage regulator
Granted: December 10, 2024
Patent Number:
12165981
A semiconductor package includes a package substrate having a first surface and an opposing second surface, and further includes an integrated circuit (IC) die disposed at the second surface and having a third surface facing the second surface and an opposing fourth surface. The IC die has a first region comprising one or more metal layers and circuit components for one or more functions of the IC die and a second region offset from the first region in a direction parallel with the third…
SRAM power savings and write assist
Granted: December 10, 2024
Patent Number:
12165700
A technique reduces power consumption of a bit cell in a memory and provides write assistance to the bit cell. When the bit cell is active, a power-saving write-assist circuit coupled to the bit cell is selectively sized according to a type of memory access. When the bit cell is inactive, the virtual power supply node floats to a predetermined voltage between a first voltage on a first power supply node coupled to the bit cell and a second voltage on a second power supply node coupled to…
Multi-accelerator compute dispatch
Granted: December 10, 2024
Patent Number:
12165252
Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such…
Direct-connected machine learning accelerator
Granted: December 10, 2024
Patent Number:
12165016
Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more…
Accelerating predicated instruction execution in vector processors
Granted: December 10, 2024
Patent Number:
12164923
Methods and systems are disclosed for processing a vector by a vector processor. Techniques disclosed include receiving predicated instructions by a scheduler, each of which is associated with an opcode, a vector of elements, and a predicate. The techniques further include executing the predicated instructions. Executing a predicated instruction includes compressing, based on an index derived from a predicate of the instruction, elements in a vector of the instruction, where the elements…
Buffer display data in a chiplet architecture
Granted: December 10, 2024
Patent Number:
12164365
An apparatus and method for efficiently managing power consumption among multiple, replicated functional blocks of an integrated circuit. An integrated circuit includes multiple, replicated functional blocks that use separate power domains. Data of a given type is stored in an interleaved manner among at least two of the multiple functional blocks. In one implementation, a prior static allocation determines that only a subset of the functional blocks store the data of the given type. In…
Frequency/state based power management thresholds
Granted: December 10, 2024
Patent Number:
12164353
A system and method for determining power-performance state transition thresholds in a computing system. A processor comprises several functional blocks and a power manager. Each of the functional blocks produces data corresponding to an activity level associated with the respective functional block. The power manager determines activity levels of the functional blocks and compares the activity level of a given functional block to a threshold to determine if a power-performance state…
Low power single phase logic gate latch for clock-gating
Granted: December 3, 2024
Patent Number:
12160238
Systems, apparatuses, and methods for implementing a low-power single-phase logic gate latch for clock-gating are disclosed. A latch circuit includes shared clocked transistors without including clock inverters. The shared clocked transistors include a P-type clocked transistor and an N-type clocked transistor, with the clock input coupled to the gate of the P-type clocked transistor and to the gate of the N-type clocked transistor. The P-type clocked transistor is coupled between first…
Common circuitry for triangle intersection and instance transformation for ray tracing
Granted: December 3, 2024
Patent Number:
12159341
A technique for performing ray tracing operations is provided. The technique includes traversing through a bounding volume hierarchy to an instance node; performing an instance node transform using common circuitry; traversing to a leaf node of the bounding volume hierarchy; and performing an intersection test for the leaf node using the common circuitry.
Boot firmware corruption detection and mitigation
Granted: December 3, 2024
Patent Number:
12158956
An apparatus and method for providing access to reliable boot firmware. In various implementations, a computing system includes an integrated circuit with a security processor. Prior to performing any steps of a bootup operation using one of multiple copies of boot firmware, the security processor determines whether multiple signatures exist where the signatures are based on the multiple copies of boot firmware. Each of the multiple copies of boot firmware is a copy of a particular…
Region based split-directory scheme to adapt to large cache sizes
Granted: December 3, 2024
Patent Number:
12158845
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple…
Data co-location using address hashing for high-performance processing in memory
Granted: December 3, 2024
Patent Number:
12158842
A processing system allocates memory to co-locate input and output operands for operations for processing in memory (PIM) execution in the same PIM-local memory while exploiting row-buffer locality and complying with conventional memory abstraction. The processing system identifies as “super rows” virtual rows that span all the banks of a memory device. Each super row has a different bank-interleaving pattern, referred to as a “color”. A group of contiguous super rows that has…
Full dynamic post-package repair
Granted: December 3, 2024
Patent Number:
12158827
A memory controller includes a command queue, an arbiter, and a controller. The controller is responsive to a repair signal for migrating data from a failing region of a memory to a buffer, generating at least one command to perform a post-package repair operation of the failing region, and migrating the data from the buffer to a substitute region of the memory. The controller migrates the data to and from the buffer by providing migration read requests and migration write requests,…
Processor-guided execution of offloaded instructions using fixed function operations
Granted: November 26, 2024
Patent Number:
12153926
Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request…