Instant neural graphics primitives: lightning fast NeRF and more
-
Updated
Oct 8, 2025 - Cuda
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
Instant neural graphics primitives: lightning fast NeRF and more
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
cuGraph - RAPIDS Graph Analytics Library
GPU Accelerated t-SNE for CUDA with Python bindings
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Fuse multiple depth frames into a TSDF voxel volume.
CUDA Kernel Benchmarking Library
Graphics Processing Units Molecular Dynamics
cuVS - a library for vector search and clustering on the GPU
GPU accelerated decision optimization
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
PopSift is an implementation of the SIFT algorithm in CUDA.
MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
Created by Nvidia
Released June 23, 2007