AMD Patent Applications

SYSTEM PERFORMANCE MANAGEMENT USING PRIORITIZED COMPUTE UNITS

Granted: January 5, 2017
Application Number: 20170004080
Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially…

METHOD AND APPARATUS FOR REGULATING PROCESSING CORE LOAD IMBALANCE

Granted: December 29, 2016
Application Number: 20160378565
Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first…

METHOD AND APPARATUS FOR PERFORMING A SEARCH OPERATION ON HETEROGENEOUS COMPUTING SYSTEMS

Granted: December 29, 2016
Application Number: 20160378791
A method and apparatus for performing a top-down Breadth-First Search (BFS) includes performing a first determination whether to convert to a bottom-up BFS. A second determination is performed whether to convert to the bottom-up BFS, based upon the first determination being positive. The bottom-up BFS is performed, based upon the first determination and the second determination being positive. A third determination is made whether to convert from the bottom-up BFS to the top-down BFS,…

HETEROGENEOUS ENQUEUING AND DEQUEUING MECHANISM FOR TASK SCHEDULING

Granted: December 22, 2016
Application Number: 20160371116
Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.

MEMORY HEAPS IN A MEMORY MODEL FOR A UNIFIED COMPUTING SYSTEM

Granted: December 22, 2016
Application Number: 20160371197
A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.

HYBRID RENDER WITH PREFERRED PRIMITIVE BATCH BINNING AND SORTING

Granted: December 22, 2016
Application Number: 20160371873
A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the…

MANAGING COHERENT MEMORY BETWEEN AN ACCELERATED PROCESSING DEVICE AND A CENTRAL PROCESSING UNIT

Granted: December 15, 2016
Application Number: 20160364334
Existing multiprocessor computing systems often have insufficient memory coherency and, consequently, are unable to efficiently utilize separate memory systems. Specifically, a CPU cannot effectively write to a block of memory and then have a GPU access that memory unless there is explicit synchronization. In addition, because the GPU is forced to statically split memory locations between itself and the CPU, existing multiprocessor computing systems are unable to efficiently utilize the…

PER-BLOCK SORT FOR PERFORMANCE ENHANCEMENT OF PARALLEL PROCESSORS

Granted: December 8, 2016
Application Number: 20160357580
A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.

SCAN FLIP-FLOP CIRCUIT WITH DEDICATED CLOCKS

Granted: November 24, 2016
Application Number: 20160341793
In one form, a scan flip-flop includes a clock gating cell and a dedicated clock flip-flop. The clock gating cell provides an input clock input signal as a scan clock signal when a scan shift enable signal is active, and provides the input clock signal as a data clock signal when the scan shift enable signal is inactive. The dedicated clock flip-flop stores a data input signal and provides the data input signal, so stored, as a data output signal in response to transitions of the data…

DROOP DETECTION FOR LOW-DROPOUT REGULATOR

Granted: November 24, 2016
Application Number: 20160342166
A processor system includes first and second regulators for regulating an adjusted supply voltage. In one embodiment, the regulator system comprises a digital low-dropout (DLDO) control system comprising first and second regulators that generate a plurality of control signals to regulate an adjusted power supply voltage and that generate a charge when a droop level falls below a droop threshold value. The first regulator implements a first control loop and the second regulator implements…

DROOP DETECTION AND REGULATION FOR PROCESSOR TILES

Granted: November 24, 2016
Application Number: 20160342185
A processor system includes first and second regulators for regulating an adjusted supply voltage. The first and second regulators generate a plurality of control signals to regulate an adjusted power supply voltage and that generate a charge when a droop level falls below a droop threshold value by implementing first and second control loops. A supply adjustment block with the two regulators and control loops are provided for each processor core allowing different cores to have…

INFRASTRUCTURE TO SUPPORT ACCELERATOR COMPUTATION MODELS FOR ACTIVE STORAGE

Granted: November 17, 2016
Application Number: 20160335064
A method, a system, and a non-transitory computer readable medium for generating application code to be executed on an active storage device are presented. The parts of an application that can be executed on the active storage device are determined. The parts of the application that will not be executed on the active storage device are converted into code to be executed on a host device. The parts of the application that will be executed on the active storage device are converted into…

SYSTEM AND METHOD FOR DETERMINING CONCURRENCY FACTORS FOR DISPATCH SIZE OF PARALLEL PROCESSOR KERNELS

Granted: November 17, 2016
Application Number: 20160335143
Disclosed is a method of determining concurrency factors for an application running on a parallel processor. Also disclosed is a system for implementing the method. In an embodiment, the method includes running at least a portion of the kernel as sequences of mini-kernels, each mini-kernel including a number of concurrently executing workgroups. The number of concurrently executing workgroups is defined as a concurrency factor of the mini-kernel. A performance measure is determined for…

POWER REDUCTION IN BUS INTERCONNECTS

Granted: October 6, 2016
Application Number: 20160291678
In one form, power consumed in transmitting data over a bus interconnect is reduced. The power is reduced by configuring a buffer that is used to store data to be transmitted over the bus interconnect as a two-dimensional (2D) buffer array having a plurality of rows and columns. The data stored in the 2D buffer array is then analyzed to determine a mode of transmitting the data that uses a least amount of power. The determined mode is used to transmit the data over the bus interconnect.

REDUNDANCY METHOD AND APPARATUS FOR SHADER COLUMN REPAIR

Granted: September 8, 2016
Application Number: 20160260192
Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe…

PROVIDING ASYNCHRONOUS DISPLAY SHADER FUNCTIONALITY ON A SHARED SHADER CORE

Granted: September 8, 2016
Application Number: 20160260246
A method, a non-transitory computer readable medium, and a processor for performing display shading for computer graphics are presented. Frame data is received by a display shader, the frame data including at least a portion of a rendered frame. Parameters for modifying the frame data are received by the display shader. The parameters are applied to the frame data by the display shader to create a modified frame. The modified frame is displayed on a display device.

CONTENT-ADAPTIVE B-PICTURE PATTERN VIDEO ENCODING

Granted: September 8, 2016
Application Number: 20160261869
A method of video encoding is disclosed which is content adaptive. The encoding method is automatically adjusted to optimize the encoding, the adjusting depending on the content of the pictures being encoded. A system for implementing the method and a non-transitory computer-readable storage medium for storing instructions of the method are also disclosed.

METHOD AND APPARATUS FOR DIRECTING APPLICATION REQUESTS FOR RENDERING

Granted: September 1, 2016
Application Number: 20160253774
A method and system for directing image rendering, implemented in a computer system including a plurality of processors includes determining one or more processors in the system on which to execute one or more commands. A graphics processing unit (GPU) control application program interface (API) determines one or more processors in the system on which to execute one or more commands. A signal is transmitted to each of the one or more processors indicating which of the one or more…

SCHEDULING OF DATA MIGRATION

Granted: August 25, 2016
Application Number: 20160246540
In one form, scheduling data migration comprises determining whether the data is likely to be used by an input/output (I/O) device, the data being at a location remote to the I/O device; and scheduling the data for migration from the remote location to a location local to the I/O device in response to determining that the data is likely to be used by the I/O device.

FLIP-FLOP CIRCUIT WITH LATCH BYPASS

Granted: August 25, 2016
Application Number: 20160248405
In one form, a flip-flop comprises a master latch, a slave latch, and a multiplexer. The master latch has an input for receiving a data input signal, and an output, and operates in transparent and latching modes during respective first and second phases of a clock signal. The slave latch has an input coupled to the output of the master latch, and an output, and operates in the transparent and latching modes during the second and first phases of the clock signal, respectively. The…