AMD Patent Applications

Processing Element-Centric All-to-All Communication

Granted: July 4, 2024
Application Number: 20240220336
In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data…

MULTI-PHASE CLOCK GATING WITH PHASE SELECTION

Granted: July 4, 2024
Application Number: 20240223192
A multi-phase clock gating circuit receives a plurality of respective phased clock signals, and a one-hot stop phase select signal indicating a first selected phase for which gating of the phased clock signals is to be started. Responsive to a clock control signal indicating the phased clock signals are to be gated, the clock signals are gated beginning at the first selected phase, in order of phase, including looping from a last phase to a first phase.

TESTING PARITY AND ECC LOGIC USING MBIST

Granted: July 4, 2024
Application Number: 20240221854
A processing device used for MBIST is provided which comprises a data storage structure configured to store data, data protection circuitry configured to add at least one protection bit to corresponding portions of the data written to the data storage structure, data protection checking circuitry configured to identify one or more errors made by the data protection circuitry and an MBIST controller configured to receive the corresponding portions of data written to the data storage…

WRONG WAY READ-BEFORE WRITE SOLUTIONS IN SRAM

Granted: July 4, 2024
Application Number: 20240221805
A static random-access memory (SRAM) circuit includes an SRAM bitcell coupled to a word line, a bit line and a complementary bit line. A precharge circuit is coupled to the bit line and the complementary bit line and includes a precharge input. A first keeper transistor is coupled to the bit line and a second keeper transistor is coupled to the complementary bit line. A write driver circuit includes a select input receiving a select signal, a write data input, and a write data compliment…

TECHNIQUE FOR TESTING RAY FOR INTERSECTION WITH ORIENTED BOUNDING BOXES

Granted: July 4, 2024
Application Number: 20240221284
A technique for performing ray tracing operations is provided. The technique includes determining error bounds for a rotation operation for a ray; selecting a technique for determining whether the ray intersects a bounding box based on the error bounds; and determining whether the ray hits the bounding box based on the selected technique.

BACKSIDE INTERFACE FOR CHIPLET ARCHITECTURE MIXING

Granted: July 4, 2024
Application Number: 20240220438
The disclosed semiconductor package includes a first chiplet area for receiving a first chiplet, a second chiplet area for receiving a second chiplet, and a host die coupled to the first and second chiplet areas. The semiconductor package also includes an interconnect directly coupling the first chiplet area to the second chiplet area. Various other methods, systems, and computer-readable media are also disclosed.

TIERED MEMORY CACHING

Granted: July 4, 2024
Application Number: 20240220415
The disclosed computer-implemented method includes locating, from a processor storage, a partial tag corresponding to a memory request for a line stored in a memory having a tiered memory cache and in response to a partial tag hit for the memory request, locating, from a partition of the tiered memory cache indicated by the partial tag, a full tag for the line. The method also includes fetching, in response to a full tag hit, the requested line from the partition of the tiered memory…

UNIFIED FLEXIBLE CACHE

Granted: July 4, 2024
Application Number: 20240220409
The disclosed computer-implemented method includes partitioning a cache structure into a plurality of cache partitions designated by a plurality of cache types, forwarding a memory request to a cache partition corresponding to a target cache type of the memory request, and performing, using the cache partition, the memory request. Various other methods, systems, and computer-readable media are also disclosed.

SYSTEMS AND METHODS FOR HOSTING AN INTERLEAVE ACROSS ASYMMETRICALLY POPULATED MEMORY CHANNELS ACROSS TWO OR MORE DIFFERENT MEMORY TYPES

Granted: July 4, 2024
Application Number: 20240220405
The disclosed computing device can include at least one memory of a particular type having a plurality of memory channels, and at least one memory of at least one other type having a plurality of links. The computing device can also include remapping circuitry configured to homogenously interleave the plurality of memory channels with the at least one memory of the at least one other type. Various other methods, systems, and computer-readable media are also disclosed.

FULL DYNAMIC POST-PACKAGE REPAIR

Granted: July 4, 2024
Application Number: 20240220379
A memory controller includes a command queue, an arbiter, and a controller. The controller is responsive to a repair signal for migrating data from a failing region of a memory to a buffer, generating at least one command to perform a post-package repair operation of the failing region, and migrating the data from the buffer to a substitute region of the memory. The controller migrates the data to and from the buffer by providing migration read requests and migration write requests,…

SYSTEMS AND METHODS FOR SHARING MEMORY ACROSS CLUSTERS OF DIRECTLY CONNECTED NODES

Granted: July 4, 2024
Application Number: 20240220320
An exemplary system comprises a cluster of nodes that are communicatively coupled to one another via at least one direct link and collectively include a plurality of memory devices. The exemplary system also comprises at least one system memory manager communicatively coupled to the cluster of nodes. In one example, the system memory manager is configured to allocate a plurality of sharable memory pools across the memory devices. Various other systems, methods, and computer-readable…

PIM Search Stop Control

Granted: July 4, 2024
Application Number: 20240220251
In accordance with described techniques for processing-in-memory (PIM) search stop control, a computing system or computing device includes a memory system that includes a stop condition check component, which receives an instruction that includes a programmed check value. The stop condition check component compares the programmed check value to outputs of a PIM component, and the stop condition check component initiates a stop instruction to stop the PIM component from processing…

ACCELERATING RELAXED REMOTE ATOMICS ON MULTIPLE WRITER OPERATIONS

Granted: June 27, 2024
Application Number: 20240211134
A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and…

Managing a Cache Using Per Memory Region Reuse Distance Estimation

Granted: June 27, 2024
Application Number: 20240211407
A memory request issue counter (MRIC) is maintained that is incremented for every memory access a central processing unit core makes. A region reuse distance table is also maintained that includes multiple entries each of which stores the region reuse distance for a corresponding region. When a memory access request for a physical address is received, a reuse distance for the physical address is calculated. This reuse distance is the difference between the current MRIC value and a…

DISTRIBUTED CACHING POLICY FOR LARGE-SCALE DEEP LEARNING TRAINING DATA PRE-PROCESSING

Granted: June 27, 2024
Application Number: 20240211399
A distributed cache network used for machine learning is provided which comprises a network fabric having file systems which store data and a plurality of processing devices, each comprising cache memory and a processor configured to execute a training of a machine learning model and selectively cache portions of the data based on a frequency with which the data is accessed by the processor. Each processing device stores metadata identifying portions of data which are cached in the cache…

Leveraging Processing in Memory Registers as Victim Buffers

Granted: June 27, 2024
Application Number: 20240211393
In accordance with the described techniques for leveraging processing in memory registers as victim buffers, a computing device includes a memory, a processing in memory component having registers for data storage, and a memory controller having a victim address table that includes at least one address of a row of the memory that is stored in the registers. The memory controller receives a request to access the row of the memory and accesses data of the row from the registers based on…

DEVICES, SYSTEMS, AND METHODS FOR INJECTING FABRICATED ERRORS INTO MACHINE CHECK ARCHITECTURES

Granted: June 27, 2024
Application Number: 20240211362
An exemplary system includes and/or represents an agent and a machine check architecture. In one example, the machine check architecture includes and/or represents at least one circuit configured to report errors via at least one reporting register. In this example, the machine check architecture also includes and/or represents at least one error-injection register configured to cause the circuit to inject at least one fabricated error report into the reporting register in response to a…

PERFORMANCE OF BANK REFRESH

Granted: June 27, 2024
Application Number: 20240211173
A memory controller includes an arbiter. The arbiter is configured to elevate a priority of memory access requests that generate row activate commands in response to receiving a same-bank refresh request, and to send a same-bank refresh command in response to receiving the same-bank refresh request.

System Memory Training with Chipset Attached Memory

Granted: June 27, 2024
Application Number: 20240211160
System memory training with chipset attached memory is described. In accordance with the described techniques, a request is received to train a system memory of a device. Responsive to the request, contents of the system memory are transferred to a chipset attached memory. The device is operated using the contents from the chipset attached memory. While the device is being operated using the contents from the chipset attached memory, the system memory is dynamically trained. After the…

Extended Training for Memory

Granted: June 27, 2024
Application Number: 20240211142
Extended training for memory is described. In accordance with the described techniques, a training request to train a memory with extended training is received. The extended training corresponds to a longer amount of time than a default training. The extended training of the memory is performed using a set of target memory settings. In one or more implementations, the extended training is performed during a boot up phase of the computing device.