Open3D: A Modern Library for 3D Data Processing
-
Updated
Nov 23, 2025 - C++
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
Open3D: A Modern Library for 3D Data Processing
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Modular ZK(Zero Knowledge) backend accelerated by GPU
ALIEN is a CUDA-powered artificial life simulation program.
cuML - RAPIDS Machine Learning Library
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
ArrayFire: a general purpose GPU library.
Tengine is a lite, high performance, modular inference engine for embedded device
Lightning fast C++/CUDA neural network framework
Optimized primitives for collective multi-GPU communication
HIP: C++ Heterogeneous-Compute Interface for Portability
Fast inference engine for Transformer models
FlashInfer: Kernel Library for LLM Serving
LightSeq: A High Performance Library for Sequence Processing and Generation
Created by Nvidia
Released June 23, 2007