AMD Patent Applications

MINIMIZING LATENCY FROM PERIPHERAL DEVICES TO COMPUTE ENGINES

Granted: April 13, 2017
Application Number: 20170102886
Methods, systems, and computer program products are provided for minimizing latency in a implementation where a peripheral device is used as a capture device and a compute device such as a GPU processes the captured data in a computing environment. In embodiments, a peripheral device and GPU are tightly integrated and communicate at a hardware/firmware level. Peripheral device firmware can determine and store compute instructions specifically for the GPU, in a command queue. The compute…

Method and Apparatus for Workload Placement on Heterogeneous Systems

Granted: April 13, 2017
Application Number: 20170102971
The methods and apparatus can assign processing core workloads to processing cores from a heterogeneous instruction set architectures (ISA) pool of available processing cores based on processing core metric results. For example, the method and apparatus can obtain processing core metric results for one or more processing cores, such as processing cores within general purpose processors, from a heterogeneous ISA pool of available processing cores. The method and apparatus can also obtain…

MULTI-PROTOCOL HEADER GENERATION SYSTEM

Granted: March 23, 2017
Application Number: 20170085472
A communication device includes a data source that generates data for transmission over a bus, and a data encoder that receives and encodes outgoing data. An encoder system receives outgoing data from a data source and stores the outgoing data in a first queue. An encoder encodes outgoing data with a header type that is based upon a header type indication from a controller and stores the encoded data that may be a packet or a data word with at least one layered header in a second queue…

PREEMPTIVE CONTEXT SWITCHING OF PROCESSES ON AN ACCELERATED PROCESSING DEVICE (APD) BASED ON TIME QUANTA

Granted: March 16, 2017
Application Number: 20170076421
Methods and apparatus are described. A method includes an accelerated processing device running a process. When a maximum time interval during which the process is permitted to run expires before the process completes, the accelerated processing device receives an operating-system-initiated instruction to stop running the process. The accelerated processing device stops the process from running in response to the received operating-system-initiated instruction.

GRAPHICS LIBRARY EXTENSIONS

Granted: March 2, 2017
Application Number: 20170061670
Methods for enabling graphics features in processors are described herein. Methods are provided to enable trinary built-in functions in the shader, allow separation of the graphics processor's address space from the requirement that all textures must be physically backed, enable use of a sparse buffer allocated in virtual memory, allow a reference value used for stencil test to be generated and exported from a fragment shader, provide support for use specific operations in the stencil…

PRIORITY-BASED COMMAND EXECUTION

Granted: February 23, 2017
Application Number: 20170053377
A method of processing commands is provided. The method includes holding commands in queues and executing the commands in an order based on their respective priority. Commands having the same priority are held in the same queue.

MEDIA SYSTEM HAVING THREE DIMENSIONAL NAVIGATION VIA DYNAMIC CAROUSEL

Granted: February 16, 2017
Application Number: 20170046042
A system and method are set forth which combine an ability to view a motion video with an ability to simultaneously access computer programs. In certain embodiments, the media system provides access to movies, music and photos in a visually appealing three dimensional environment. In certain environments, the media system presents a three dimensional navigation tool (such as a three dimensional wheel) on which thumbnails are presented. A required resource value corresponding to system…

DISTRIBUTED GATHER/SCATTER OPERATIONS ACROSS A NETWORK OF MEMORY NODES

Granted: February 16, 2017
Application Number: 20170048320
Devices, methods, and systems for distributed gather and scatter operations in a network of memory nodes. A responding memory node includes a memory; a communications interface having circuitry configured to communicate with at least one other memory node; and a controller. The controller includes circuitry configured to receive a request message from a requesting node via the communications interface. The request message indicates a gather or scatter operation, and instructs the…

COMMUNICATION DEVICE WITH SELECTIVE ENCODING

Granted: February 2, 2017
Application Number: 20170031853
A communication device includes a data source that generates data for transmission over a bus, and that further includes a data encoder coupled to receive and encode outgoing data. The encoder further includes a coupling toggle rate (CTR) calculator configured to calculate a CTR for the outgoing data, a threshold calculator configured to determine an expected value of the CTR as a threshold value, a comparator configured to compare the calculated CTR to the threshold value wherein the…

SPLIT STORAGE OF ANTI-ALIASED SAMPLES

Granted: January 19, 2017
Application Number: 20170018053
Embodiments of the present invention are directed to improving the performance of anti-aliased image rendering. One embodiment is a method of rendering a pixel from an anti-aliased image. The method includes: storing a first set and a second set of samples from a plurality of anti-aliased samples of the pixel respectively in a first memory and a second memory; and rendering a determined number of said samples from one of only the first set or the first and second sets. Corresponding…

SYSTEM PERFORMANCE MANAGEMENT USING PRIORITIZED COMPUTE UNITS

Granted: January 5, 2017
Application Number: 20170004080
Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially…

METHOD AND APPARATUS FOR REGULATING PROCESSING CORE LOAD IMBALANCE

Granted: December 29, 2016
Application Number: 20160378565
Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first…

METHOD AND APPARATUS FOR PERFORMING A SEARCH OPERATION ON HETEROGENEOUS COMPUTING SYSTEMS

Granted: December 29, 2016
Application Number: 20160378791
A method and apparatus for performing a top-down Breadth-First Search (BFS) includes performing a first determination whether to convert to a bottom-up BFS. A second determination is performed whether to convert to the bottom-up BFS, based upon the first determination being positive. The bottom-up BFS is performed, based upon the first determination and the second determination being positive. A third determination is made whether to convert from the bottom-up BFS to the top-down BFS,…

HETEROGENEOUS ENQUEUING AND DEQUEUING MECHANISM FOR TASK SCHEDULING

Granted: December 22, 2016
Application Number: 20160371116
Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.

MEMORY HEAPS IN A MEMORY MODEL FOR A UNIFIED COMPUTING SYSTEM

Granted: December 22, 2016
Application Number: 20160371197
A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.

HYBRID RENDER WITH PREFERRED PRIMITIVE BATCH BINNING AND SORTING

Granted: December 22, 2016
Application Number: 20160371873
A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the…

MANAGING COHERENT MEMORY BETWEEN AN ACCELERATED PROCESSING DEVICE AND A CENTRAL PROCESSING UNIT

Granted: December 15, 2016
Application Number: 20160364334
Existing multiprocessor computing systems often have insufficient memory coherency and, consequently, are unable to efficiently utilize separate memory systems. Specifically, a CPU cannot effectively write to a block of memory and then have a GPU access that memory unless there is explicit synchronization. In addition, because the GPU is forced to statically split memory locations between itself and the CPU, existing multiprocessor computing systems are unable to efficiently utilize the…

PER-BLOCK SORT FOR PERFORMANCE ENHANCEMENT OF PARALLEL PROCESSORS

Granted: December 8, 2016
Application Number: 20160357580
A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.

SCAN FLIP-FLOP CIRCUIT WITH DEDICATED CLOCKS

Granted: November 24, 2016
Application Number: 20160341793
In one form, a scan flip-flop includes a clock gating cell and a dedicated clock flip-flop. The clock gating cell provides an input clock input signal as a scan clock signal when a scan shift enable signal is active, and provides the input clock signal as a data clock signal when the scan shift enable signal is inactive. The dedicated clock flip-flop stores a data input signal and provides the data input signal, so stored, as a data output signal in response to transitions of the data…

DROOP DETECTION FOR LOW-DROPOUT REGULATOR

Granted: November 24, 2016
Application Number: 20160342166
A processor system includes first and second regulators for regulating an adjusted supply voltage. In one embodiment, the regulator system comprises a digital low-dropout (DLDO) control system comprising first and second regulators that generate a plurality of control signals to regulate an adjusted power supply voltage and that generate a charge when a droop level falls below a droop threshold value. The first regulator implements a first control loop and the second regulator implements…