Nvidia Patent Grants

Generalized acceleration of matrix multiply accumulate operations

Granted: January 5, 2021
Patent Number: 10884734
A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A…

Transfer of video signals using variable segmented lookup tables

Granted: December 29, 2020
Patent Number: 10880531
The disclosure is directed to transforming signals from one signal format to another signal format. For example, the format of a digital signal can change from storing video information in 12 bits of data to storing the video information in 32 bits of data. Other storage values and combinations can also be used. Since the number of bits available to store a portion of the video information can change when changing formats, a process is needed to translate or transform the video…

Method and system for customizing optimal settings using end-user preferences

Granted: December 29, 2020
Patent Number: 10878770
Embodiments of the present invention provide a novel solution that uses subjective end-user input to generate optimal image quality settings for an application. Embodiments of the present invention enable end-users to rank and/or select various adjustable application parameter settings in a manner that allows them to specify which application parameters and/or settings are most desirable to them for a given application. Based on the feedback received from end-users, embodiments of the…

Techniques for pre-processing index buffers for a graphics processing pipeline

Granted: December 29, 2020
Patent Number: 10878611
In various embodiments, a deduplication application pre-processes index buffers for a graphics processing pipeline that generates rendered images via a shading program. In operation, the deduplication application causes execution threads to identify a set of unique vertices specified in an index buffer based on an instruction. The deduplication application then generates a vertex buffer and an indirect index buffer based on the set of unique vertices. The vertex buffer and the indirect…

Binding constants at runtime for improved resource utilization

Granted: December 29, 2020
Patent Number: 10877757
A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers…

Photorealistic image stylization using a neural network model

Granted: December 22, 2020
Patent Number: 10872399
Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic. Examples of styles include seasons (summer, winter, etc.), weather (sunny, rainy, foggy, etc.), lighting (daytime, nighttime, etc.). A photorealistic image stylization process includes a stylization step and a smoothing step. The stylization step transfers the style of the reference photo to the content photo. A…

Method and system for immersive virtual reality (VR) streaming with reduced audio latency

Granted: December 22, 2020
Patent Number: 10871939
A virtual reality (VR) audio rendering system and method of using HRTF functions to quickly capture new positional cues to pre-computed audio frames responsive to changes in user position relative to sound systems. In a client-server VR system, when a user position change is detected, the client determines an appropriate HRTF based on the new position and convolves them with a set of audio frames that have been generated by the server based on a prior position, resulting in modified…

Hierarchical Jacobi methods and systems implementing a dense symmetric eigenvalue solver

Granted: December 15, 2020
Patent Number: 10867008
Embodiments of the present invention provide a hierarchical, multi-layer Jacobi method for implementing a dense symmetric eigenvalue solver using multiple processors. Each layer of the hierarchical method is configured to process problems of different sizes, and the division between the layers is defined according to the configuration of the underlying computer system, such as memory capacity and processing power, as well as the communication overhead between device and host. In general,…

Query-specific behavioral modification of tree traversal

Granted: December 15, 2020
Patent Number: 10867429
Methods and systems are described in some examples for changing the traversal of an acceleration data structure in a highly dynamic query-specific manner, with each query specifying test parameters, a test opcode and a mapping of test results to actions. In an example ray tracing implementation, traversal of a bounding volume hierarchy by a ray is performed with the default behavior of the traversal being changed in accordance with results of a test performed using the test opcode and…

Generation of synthetic images for training a neural network model

Granted: December 15, 2020
Patent Number: 10867214
Training deep neural networks requires a large amount of labeled training data. Conventionally, labeled training data is generated by gathering real images that are manually labelled which is very time-consuming. Instead of manually labelling a training dataset, domain randomization technique is used generate training data that is automatically labeled. The generated training data may be used to train neural networks for object detection and segmentation (labelling) tasks. In an…

Block-based lossless compression of geometric data

Granted: December 15, 2020
Patent Number: 10866990
An apparatus, computer readable medium, and method are disclosed for decompressing compressed geometric data stored in a lossless compression format. The compressed geometric data resides within a compression block sized according to a system cache line. An indirection technique maps a global identifier value in a linear identifier space to corresponding variable rate compressed data. The apparatus may include decompression circuitry within a graphics processing unit configured to…

Uniform register file for improved resource utilization

Granted: December 15, 2020
Patent Number: 10866806
A compiler parses a multithreaded application into cohesive blocks of instructions. Cohesive blocks include instructions that do not diverge or converge. Each cohesive block is associated with one or more uniform registers. When a set of threads executes the instructions in a given cohesive block, each thread in the set may access the uniform register independently of the other threads in the set. Accordingly, the uniform register may store a single copy of data on behalf of all threads…

System-generated stable barycentric coordinates and direct plane equation access

Granted: December 8, 2020
Patent Number: 10861230
A graphics processing pipeline includes three architectural features that allow a fragment shader to efficiently calculate per-sample attribute values using barycentric coordinates and per-vertex attributes. The first feature is barycentric coordinate injection to provide barycentric coordinates to the fragment shader. The second feature is an attribute qualifier that allows an attribute of a graphics primitive to be processed without conventional fixed-function interpolation. The third…

Sparse convolutional neural network accelerator

Granted: December 8, 2020
Patent Number: 10860922
A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the…

Budget-aware method for detecting activity in video

Granted: December 8, 2020
Patent Number: 10860859
Detection of activity in video content, and more particularly detecting in video start and end frames inclusive of an activity and a classification for the activity, is fundamental for video analytics including categorizing, searching, indexing, segmentation, and retrieval of videos. Existing activity detection processes rely on a large set of features and classifiers that exhaustively run over every time step of a video at multiple temporal scales, or as a small improvement…

Efficient matrix data format applicable for artificial neural network

Granted: December 8, 2020
Patent Number: 10860293
Many computing systems process data organized in a matrix format. For example, artificial neural networks (ANNs) perform numerous computations on data organized into matrices using conventional matrix arithmetic operations. One such operation, which is commonly performed, is the transpose operation. Additionally, many such systems need to process many matrices and/or matrices that are large in size. For sparse matrices that hold few significant values and many values that can be ignored,…

Computing device with moving display

Granted: December 1, 2020
Patent Number: 10852775
In various examples, a portable computing device is provided that has a bottom shell and a top shell pivotally coupled to the bottom shell for movement between a closed position and at least one open position. A display fits within a perimeter rim of the top shell, the display being obscured from view when the top shell is in the closed position and being viewable when the top shell is in the at least one open position. A coupling linkage couples the display, the top shell, and the…

Rendering scenes using a combination of raytracing and rasterization

Granted: December 1, 2020
Patent Number: 10853994
The disclosure is directed to methods and processes of rendering a complex scene using a combination of raytracing and rasterization. The methods and processes can be implemented in a video driver or software library. A developer of an application can provide information to an application programming interface (API) call as if a conventional raytrace API is being called. The method and processes can analyze the scene using a variety of parameters to determine a grouping of objects within…

Device profiling in GPU accelerators by using host-device coordination

Granted: December 1, 2020
Patent Number: 10853044
System and method of compiling a program having a mixture of host code and device code to enable Profile Guided Optimization (PGO) for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: profile instrumentation counters for the device functions; and instructions for the host…

Voltage/frequency scaling for overcurrent protection with on-chip ADC

Granted: December 1, 2020
Patent Number: 10852811
An integrated circuit such as, for example a graphics processing unit (GPU), having an on-chip analog to digital converter (ADC) for use in overcurrent protection of the chip is described, where the overcurrent protection response times are substantially faster than techniques with external ADC. A system-on-chip (SoC) includes the integrated circuit and a multiplexer arranged externally to the chip having the ADC, where the multiplexer provides the ADC with a data stream of sampling…