Nvidia Patent Grants

Software methods in a GPU

Granted: August 15, 2017
Patent Number: 9734545
One embodiment of the present invention sets forth a technique for executing a software method within a graphics processing unit (GPU) that minimizes the number of clock cycles during which the graphics engine is idled. The function of the software method is performed by a firmware method that is executed by a processor within the GPU. The firmware method is executed to access and optionally update the state stored in the GPU. Unlike execution of a conventional software method, execution…

Split driver to control multiple graphics processors in a computer system

Granted: August 15, 2017
Patent Number: 9734546
A computer system includes an operating system having a kernel and configured to launch a plurality of computing processes. The system also includes a plurality of graphics processing units (GPUs), a front-end driver module, and a plurality of back-end driver modules. The GPUs are configured to execute instructions on behalf of the computing processes subject to a GPU service request. The front-end driver module is loaded into the kernel and configured to receive the GPU service request…

Caching of adaptively sized cache tiles in a unified L2 cache with surface compression

Granted: August 15, 2017
Patent Number: 9734548
One embodiment of the present invention includes techniques for adaptively sizing cache tiles in a graphics system. A device driver associated with a graphics system sets a cache tile size associated with a cache tile to a first size. The detects a change from a first render target configuration that includes a first set of render targets to a second render target configuration that includes a second set of render targets. The device driver sets the cache tile size to a second size based…

System and method for translating program functions for correct handling of local-scope variables and computing system incorporating the same

Granted: August 8, 2017
Patent Number: 9727338
A system and method of translating functions of a program. In one embodiment, the system includes: (1) a local-scope variable identifier operable to identify local-scope variables employed in the at least some of the functions as being either thread-shared local-scope variables or thread-private local-scope variables and (2) a function translator associated with the local-scope variable identifier and operable to translate the at least some of the functions to cause thread-shared memory…

Method and system for distributed shader optimization

Granted: August 8, 2017
Patent Number: 9727339
Embodiments of the present invention are operable to communicate a list of important shaders and their current best-known compilations to remote client devices over a communications network. Client devices are allowed to produce modified shader compilations by varying optimizations. If a client device produces a modified compilation that beats an important shader's current best-known compilation, embodiments of the present invention can communicate this new best-known shader compilation…

Techniques for render pass dependencies in an API

Granted: August 8, 2017
Patent Number: 9727392
Techniques for passing dependencies in an application programming interface API includes identifying a plurality of passes of execution commands. For each set of passes, wherein one pass is a destination pass and the other pass is a source pass to each other, one or more dependencies, of one or more dependency types, are determined between the execution commands of the destination pass and the source pass. Pass objects are then created for each identified pass, wherein each pass object…

Assymetric coherent caching for heterogeneous computing

Granted: August 8, 2017
Patent Number: 9727463
A method of caching data in the memory of electronic processor units including compiling, in a first processor configured to perform data-parallel computation, a set of asymmetric coherent caching rules. The set of rules configure the first processor to be: inoperable to cache, in a second level memory cache of the first electronic processor unit, data whose home location is in a final memory store of a second electronic processor unit; operable to cache, in the second level memory cache…

Efficient CPU mailbox read access to GPU memory

Granted: August 8, 2017
Patent Number: 9727521
Techniques are disclosed for peer-to-peer data transfers where a source device receives a request to read data words from a target device. The source device creates a first and second read command for reading a first portion and a second portion of a plurality of data words from the target device, respectively. The source device transmits the first read command to the target device, and, before a first read operation associated with the first read command is complete, transmits the…

System with a high power chip and a low power chip having low interconnect parasitics

Granted: August 8, 2017
Patent Number: 9728481
An IC system includes low-power chips, e.g., memory chips, located proximate one or more higher power chips, e.g., logic chips, without suffering the effects of overheating. The IC system may include a high-power chip disposed on a packaging substrate and a low-power chip embedded in the packaging substrate to form a stack. Because portions of the packaging substrate thermally insulate the low-power chip from the high-power chip, the low-power chip can be embedded in the IC system in…

System and method for early packet header verification

Granted: August 1, 2017
Patent Number: 9720768
A receiver, transmitter and method for early packet header verification are provided. In one embodiment, the method includes: (1) receiving a payload flit of a preceding packet and a header flit of a current packet; and (2) using a Cyclic Redundancy Check (CRC) in the header flit to verify the payload flit of the preceding packet and the header flit of the current packet.

Adaptive multilevel binning to improve hierarchical caching

Granted: August 1, 2017
Patent Number: 9720842
A device driver calculates a tile size for a plurality of cache memories in a cache hierarchy. The device driver calculates a storage capacity of a first cache memory. The device driver calculates a first tile size based on the storage capacity of the first cache memory and one or more additional characteristics. The device driver calculates a storage capacity of a second cache memory. The device driver calculates a second tile size based on the storage capacity of the second cache…

Technique for performing memory access operations via texture hardware

Granted: August 1, 2017
Patent Number: 9720858
A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address…

System, method, and computer program product for a stereoscopic image lasso

Granted: August 1, 2017
Patent Number: 9721187
A system, method, and computer program product for providing a lasso selection tool for a stereoscopic image is disclosed. The method includes the steps of obtaining a lasso region of a stereoscopic image pair based on a path defined by a user using a lasso selection tool. An object in a first image of the stereoscopic image pair is identified, where the object is at least partially included within the lasso region and the object is identified in a second image of the stereoscopic image…

Fully parallel in-place construction of 3D acceleration structures and bounding volume hierarchies in a graphics processing unit

Granted: August 1, 2017
Patent Number: 9721320
A non-transitory computer-readable storage medium having computer-executable instructions for causing a computer system to perform a method for constructing bounding volume hierarchies from binary trees is disclosed. The method includes providing a binary tree including a plurality of leaf nodes and a plurality of internal nodes. Each of the plurality of internal nodes is uniquely associated with two child nodes, wherein each child node comprises either an internal node or leaf node. The…

System, method, and computer program product for discarding pixel samples

Granted: August 1, 2017
Patent Number: 9721381
A system, method, and computer program product are provided for discarding pixel samples. The method includes the steps of completing shading operations for a pixel set including one or more pixels to generate per-sample shaded attributes according to a shader program executed by a processing pipeline. Discard information for the pixel set is evaluated and one or more per-sample shaded attributes for at least one pixel in the pixel set are discarded based on the evaluated discard…

Method and system for generating an image including optically zoomed and digitally zoomed regions

Granted: August 1, 2017
Patent Number: 9723216
A method for generating images. The method includes capturing first image data representing a first scene taken optically at a first magnification index, wherein the first image data comprises a first region of an image. The method includes capturing second image data representing a second scene taken optically at a second magnification index that is less than the first magnification index, wherein the second image data comprises a second region of the image. The method includes…

Execution state analysis for assigning tasks to streaming multiprocessors

Granted: July 25, 2017
Patent Number: 9715413
One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as…

Open solder mask and or dielectric to increase lid or ring thickness and contact area to improve package coplanarity

Granted: July 25, 2017
Patent Number: 9716051
A packaging substrate, a packaged semiconductor device, a computing device and methods for forming the same are provided. In one embodiment, a packaging substrate is provided that includes a packaging structure having a chip mounting surface and a bottom surface. The packaging structure has at a plurality of conductive paths formed between the chip mounting surface and the bottom surface. The conductive paths are configured to provide electrical connection between an integrated circuit…

System and method for allocating memory of differing properties to shared data objects

Granted: July 18, 2017
Patent Number: 9710275
A system and method for allocating shared memory of differing properties to shared data objects and a hybrid stack data structure. In one embodiment, the system includes: (1) a hybrid stack creator configured to create, in the shared memory, a hybrid stack data structure having a lower portion having a more favorable property and a higher portion having a less favorable property and (2) a data object allocator associated with the hybrid stack creator and configured to allocate storage…

Methods and apparatus for auto-throttling encapsulated compute tasks

Granted: July 18, 2017
Patent Number: 9710306
Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs…