Home agent based cache transfer acceleration scheme
Granted: September 15, 2020
Patent Number:
10776282
Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache…
Using return address predictor to speed up control stack return address verification
Granted: September 8, 2020
Patent Number:
10768937
Overhead associated with verifying function return addresses to protect against security exploits is reduced by taking advantage of branch prediction mechanisms for predicting return addresses. More specifically, returning from a function includes popping a return address from a data stack. Well-known security exploits overwrite the return address on the data stack to hijack control flow. In some processors, a separate data structure referred to as a control stack is used to verify the…
Probe placement for laser probing system
Granted: September 8, 2020
Patent Number:
10768225
A control system for placing an optic probe includes a receiver circuit that receives reflected light produced from the optic probe and provides a laser probe (LP) waveform of the reflected light in response to an activation of a trigger signal. A combinational logic analysis (CLA) processor provides a CLA waveform in response to simulating an optical response at a target location on a surface of a cell of a device under test to a test pattern. A test controller receives the CLA waveform…
Tiling format for convolutional neural networks
Granted: September 1, 2020
Patent Number:
10762392
Systems, apparatuses, and methods for converting data to a tiling format when implementing convolutional neural networks are disclosed. A system includes at least a memory, a cache, a processor, and a plurality of compute units. The memory stores a first buffer and a second buffer in a linear format, where the first buffer stores convolutional filter data and the second buffer stores image data. The processor converts the first and second buffers from the linear format to third and…
Shared loads at compute units of a processor
Granted: September 1, 2020
Patent Number:
10761992
A processor reduces bus bandwidth consumption by employing a shared load scheme, whereby each shared load retrieves data for multiple compute units (CUs) of a processor. Each CU in a specified group monitors a bus for load accesses directed to a cache shared by the multiple CUs. In response to identifying a load access on the bus, a CU determines if the load access is a shared load access for its share group. In response to identifying a shared load access for its share group, the CU…
Redirecting data to improve page locality in a scalable data fabric
Granted: September 1, 2020
Patent Number:
10761986
A data processing system includes a host processor, a local memory coupled to the host processor, a plurality of remote memory media, and a scalable data fabric coupled to the host processor and to the plurality of remote memory media. The scalable data fabric includes a filter for storing information indicating a location of data that is stored by the data processing system. The host processor includes a hardware sequencer coupled to the filter for selectively moving data stored by the…
Method and apparatus for integration of non-volatile memory
Granted: September 1, 2020
Patent Number:
10761736
Described herein is a method and system for directly accessing and transferring data between a first memory architecture and a second memory architecture associated with a graphics processing unit (GPU) by treating the first memory architecture, the second memory architecture and system memory as a single physical memory, where the first memory architecture is a non-volatile memory (NVM) and the second memory architecture is a local memory. Upon accessing a virtual address (VA) range by…
Sinusoidal shaped capacitor architecture in oxide
Granted: August 25, 2020
Patent Number:
10756164
A system and method for fabricating metal insulator metal capacitors while managing semiconductor processing yield and increasing capacitance per area are described. A semiconductor device fabrication process places an oxide layer on top of a metal layer. A photoresist layer is formed on top of the oxide layer and etched with repeating spacing. One of a variety of lithography techniques is used to alter the distance between the spacings. The process etches trenches into areas of the…
Metastability insertion using the X-state
Granted: August 25, 2020
Patent Number:
10755010
An indeterminate state representative of a metastable state is inserted into an output signal of a circuit representation responsive to the circuit representation receiving a metastable triggering event during a register transfer language (RTL) simulation. The simulation view of the RTL has been modified to ensure that the indeterminate state is propagated through the circuit representations regardless of the input on which the indeterminate state appears. The indeterminate state is…
Channel training using a replica lane
Granted: August 18, 2020
Patent Number:
10749756
Systems, apparatuses, and methods for utilizing training sequences on a replica lane are described. A transmitter is coupled to a receiver via a communication channel with a plurality of lanes. One of the lanes is a replica lane used for tracking the drift in the optimal sampling point due to temperature variations, power supply variations, or other factors. While data is sent on the data lanes, test patterns are sent on the replica lane to determine if the optimal sampling point for the…
Pseudo differential receiving mechanism for single-ended signaling
Granted: August 18, 2020
Patent Number:
10749552
Systems, apparatuses, and methods for performing efficient data transfer in a computing system are disclosed. A computing system includes multiple transmitters sending singled-ended data signals to multiple receivers. A termination voltage is generated and sent to the multiple receivers. The termination voltage is coupled to each of signal termination circuitry and signal sampling circuitry within each of the multiple receivers. Any change in the termination voltage affects the…
Compressing tags in software and hardware semi-sorted caches
Granted: August 18, 2020
Patent Number:
10749545
A data storage system performs partial compression and decompression of a set of memory items. The memory items each include a data block and a tag with a prefix making up at least part of the tag. The memory items are ordered based on the prefixes. A code word is created containing compressed information representing values of the prefixes for the set of memory items. The code word and block data for each of the memory items are stored in a memory. The code word is decompressed to…
Shift of circuit periphery layout to leverage optimal use of available metal tracks in periphery logic
Granted: August 18, 2020
Patent Number:
10747931
Systems, apparatuses, and methods for efficiently floor planning a semiconductor chip are disclosed. Within either the processor or the memory of a computing system, each of a first block and a neighboring second block has a same height. A first metal track plan for the first block is unaligned with respect to a second metal track plan for the second block. An offset for moving each track of the second metal plan to align with a track of the first metal track plan is determined where the…
Shader pipelines and hierarchical shader resources
Granted: August 18, 2020
Patent Number:
10747553
Shader resources may be specified for input to a shader using a hierarchical data structure which may be referred to as a descriptor set. The descriptor set may be bound to a bind point of the shader and may contain slots with pointers to memory containing shader resources. The shader may reference a particular slot of the descriptor set using an offset, and may change shader resources by referencing a different slot of the descriptor set or by binding or rebinding a new descriptor set.…
Dynamic interrupt rate control in computing system
Granted: August 18, 2020
Patent Number:
10747298
Systems, apparatuses, and methods for intentionally delaying servicing of interrupts in a computing system are disclosed. A computing system includes a processor that services interrupts generated by components of the computing system. An interrupt controller detects a received interrupt and intentionally delays servicing of the interrupt depending on various conditions. If the interrupt is a first type of interrupt and the processor is in a first power state, servicing of the interrupt…
Buffer management for plug-in architectures in computation graph structures
Granted: August 11, 2020
Patent Number:
10742834
A computer vision processing device is provided which comprises memory configured to store data and a processor. The processor is configured to store captured image data in a first buffer and acquire access to the captured image data in the first buffer when the captured image data is available for processing. The processor is also configured to execute a first group of operations in a processing pipeline, each of which processes the captured image data accessed from the first buffer and…
Network packet templating for GPU-initiated communication
Granted: August 11, 2020
Patent Number:
10740163
Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel…
Conditional construct splitting for latency hiding
Granted: August 11, 2020
Patent Number:
10740074
A method and system for compiler optimization includes analyzing a representation of source code to identify an original conditional construct having both a high-latency instruction and one or more instructions dependent on the high-latency instruction in a branch of the conditional construct. A set of one or more instructions following the conditional construct in the representation of source code and independent of the high-latency instruction is selected. An optimized representation…
Expandable buffer for memory transactions
Granted: August 11, 2020
Patent Number:
10740029
A processing system employs an expandable memory buffer that supports enlarging the memory buffer when the processing system generates a large number of long latency memory transactions. The hybrid structure of the memory buffer allows a memory controller of the processing system to store a larger number of memory transactions while still maintaining adequate transaction throughput and also ensuring a relatively small buffer footprint and power consumption. Further, the hybrid structure…
Selectively performing ahead branch prediction based on types of branch instructions
Granted: August 4, 2020
Patent Number:
10732979
A set of entries in a branch prediction structure for a set of second blocks are accessed based on a first address of a first block. The set of second blocks correspond to outcomes of one or more first branch instructions in the first block. Speculative prediction of outcomes of second branch instructions in the second blocks is initiated based on the entries in the branch prediction structure. State associated with the speculative prediction is selectively flushed based on types of the…