Platform first error handling
Granted: July 13, 2021
Patent Number:
11061753
Systems, apparatuses, and methods for implementing a hardware enforcement mechanism to enable platform-specific firmware visibility into an error state ahead of the operating system are disclosed. A system includes at least one or more processor cores, control logic, a plurality of registers, platform-specific firmware, and an operating system (OS). The control logic allows the platform-specific firmware to decide if and when the error state is visible to the OS. In some cases, the…
Setting durations for which data is stored in a non-volatile memory based on data types
Granted: July 13, 2021
Patent Number:
11061583
An electronic device includes a non-volatile memory and a controller. The controller receives data to be written to the non-volatile memory and determines a type of the data. Based on the type of the data, the controller selects a given duration of the data from among multiple durations of the data in the non-volatile memory. The controller sets values of one or more parameters for writing the data to the non-volatile memory based on the given duration. The controller writes the data to…
Memory object tagged memory monitoring method and system
Granted: July 13, 2021
Patent Number:
11061572
Described are a method and processing apparatus to tag and track objects related to memory allocation calls. An application or software adds a tag to a memory allocation call to enable object level tracking. An entry is made into an object tracking table, which stores the tag and a variety of statistics related to the object and associated memory devices. The object statistics may be queried by the application to tune power/performance characteristics either by the application making…
Fine-grained speed binning in an accelerated processing device
Granted: July 13, 2021
Patent Number:
11061429
A technique for fine-granularity speed binning for a processing device is provided. The processing device includes a plurality of clock domains, each of which may be clocked with independent clock signals. The clock frequency at which a particular clock domain may operate is determined based on the longest propagation delay between clocked elements in that particular clock domain. The processing device includes measurement circuits for each clock domain that measure such propagation…
Power efficiency optimization in throughput-based workloads
Granted: July 6, 2021
Patent Number:
11054883
A power management algorithm framework proposes: 1) a Quality-of-Service (QoS) metric for throughput-based workloads; 2) heuristics to differentiate between throughput and latency sensitive workloads; and 3) an algorithm that combines the heuristic and QoS metric to determine target frequency for minimizing idle time and improving power efficiency without any performance degradation. A management algorithm framework enables optimizing power efficiency in server-class throughput-based…
Shader merge for reduced divergence
Granted: July 6, 2021
Patent Number:
11055895
Described herein are techniques for reducing control flow divergence. The method includes identifying two or more shader programs having commonalities, generating a merged shader program that implements functionality of the identified two or more shader programs, wherein the functionality implemented includes a first execution option for a first shader program of the two or more shader programs and a second execution option for a second shader program of the two or more shader programs,…
Method and apparatus for directing application requests for rendering
Granted: July 6, 2021
Patent Number:
11055806
A method and system for directing image rendering, implemented in a computer system including a plurality of processors includes determining one or more processors in the system on which to execute one or more commands. A graphics processing unit (GPU) control application program interface (API) determines one or more processors in the system on which to execute one or more commands. A signal is transmitted to each of the one or more processors indicating which of the one or more…
Fast thread wake-up through early lock release
Granted: July 6, 2021
Patent Number:
11055150
A thread holding a lock notifies a sleeping thread that is waiting on the lock that the lock holding thread is “about” to release the lock. In response to the notification, the waiting thread is woken up. While the waiting thread is woken up, the lock holding thread completes other operations prior to actually releasing the lock and then releases the lock. The notification to the waiting thread hides latency associated with waking up the waiting thread by allowing operations that…
Branch target buffer with early return prediction
Granted: July 6, 2021
Patent Number:
11055098
A processor includes a branch target buffer (BTB) having a plurality of entries whereby each entry corresponds to an associated instruction pointer value that is predicted to be a branch instruction. Each BTB entry stores a predicted branch target address for the branch instruction, and further stores information indicating whether the next branch in the block of instructions associated with the predicted branch target address is predicted to be a return instruction. In response to the…
System-wide low power management
Granted: July 6, 2021
Patent Number:
11054887
Systems, apparatuses, and methods for performing efficient power management for a multi-node computing system are disclosed. A computing system includes multiple nodes. When power down negotiation is distributed, negotiation for system-wide power down occurs within a lower level of a node hierarchy prior to negotiation for power down occurring at a higher level of the node hierarchy. When power down negotiation is centralized, a given node combines a state of its clients with indications…
Circuit board with phase change material
Granted: June 29, 2021
Patent Number:
11049794
Various circuit board embodiments are disclosed. In one aspect, an apparatus is provided that includes a circuit board and a first phase change material pocket positioned on or in the circuit board and contacting a surface of the circuit board.
Tracking stores and loads by bypassing load store units
Granted: June 29, 2021
Patent Number:
11048506
A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. Store-load pairs which have a strong history of store-to-load forwarding are identified. Once identified, the load is memory renamed to the register stored by the store. The memory dependency predictor may also be used to detect loads that are dependent on a store but cannot be renamed. In such a configuration,…
Real time on-chip texture decompression using shader processors
Granted: June 22, 2021
Patent Number:
11043010
A processing unit, method, and medium for decompressing or generating textures within a graphics processing unit (GPU). The textures are compressed with a variable-rate compression scheme such as JPEG. The compressed textures are retrieved from system memory and transferred to local cache memory on the GPU without first being decompressed. A table is utilized by the cache to locate individual blocks within the compressed texture. A decompressing shader processor receives compressed…
Providing interrupts from an input-output memory management unit to guest operating systems
Granted: June 22, 2021
Patent Number:
11042495
An electronic device includes a processor that executes a guest operating system; a memory having a guest portion that is reserved for storing data and information to be accessed by the guest operating system; and an input-output memory management unit (IOMMU). The IOMMU performs operations for signaling an interrupt to the guest operating system. For these operations, the IOMMU acquires, from an entry in an interrupt remapping table associated with the guest operating system, a location…
Targeted per-line operations for remote scope promotion
Granted: June 22, 2021
Patent Number:
11042484
A processing system includes one or more first caches and one or more first lock tables associated with the one or more first caches. The processing system also includes one or more processing units that each include a plurality of compute units for concurrently executing work-groups of work items, a plurality of second caches associated with the plurality of compute units and configured in a hierarchy with the one or more first caches, and a plurality of second lock tables associated…
Energy efficient adaptive data encoding method and circuit
Granted: June 15, 2021
Patent Number:
11038526
Various energy efficient data encoding schemes and computing devices are disclosed. In one aspect, a method of transmitting data from a transmitter to a receiver connected by plural wires is provided. The method includes sending from the transmitter on at least one but not all of the wires a first wave form that has first and second signal transitions. The receiver receives the first waveform and measures a first duration between the first and second signal transitions using a locally…
Pixelation optimized delta color compression
Granted: June 15, 2021
Patent Number:
11037357
A technique for compressing an original image is disclosed. According to the technique, an original image is obtained and a delta-encoded image is generated based on the original image. Next, a segregated image is generated based on the delta-encoded image and then the segregated image is compressed to produce a compressed image. The segregated image is generated because the segregated image may be compressed more efficiently than the original image and the delta image.
Light-weight memory expansion in a coherent memory system
Granted: June 15, 2021
Patent Number:
11036658
Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory…
Store-to-load forwarding
Granted: June 15, 2021
Patent Number:
11036505
An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to…
Virtual space memory bandwidth reduction
Granted: June 8, 2021
Patent Number:
11030095
A processing system includes a central processing unit (CPU) and a graphics processing unit (GPU) that has a plurality of compute units. The GPU receives an image from the CPU and determines a total result area in a virtual-matrix-multiplication space of a virtual matrix-multiplication output matrix based on convolutional parameters associated with the image in an image space. The GPU partitions the total result area of the virtual matrix-multiplication output matrix into a plurality of…