#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 1,674 public repositories matching this topic...

Open3D

isl-org / Open3D

Open3D: A Modern Library for 3D Data Processing

Updated Nov 23, 2025
C++

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Dec 1, 2025
C++

oneflow

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

machine-learning deep-neural-networks deep-learning neural-network cuda ml distributed

Updated Aug 20, 2025
C++

rapidsai / cudf

cuDF - GPU DataFrame Library

python data-science cpp gpu arrow pydata cuda pandas data-analysis dask dataframe rapids cudf

Updated Nov 30, 2025
C++

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

python deep-learning cpp gpu cuda nvidia deep-learning-library

Updated Nov 28, 2025
C++

catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

python data-science machine-learning data-mining tutorial r big-data gpu cuda kaggle gbdt gbm gpu-computing decision-trees gradient-boosting coreml catboost categorical-features

Updated Dec 1, 2025
C++

kroma-network / tachyon

Modular ZK(Zero Knowledge) backend accelerated by GPU

c-plus-plus cryptography blockchain cuda cryptocurrency cpp17 zk zero-knowledge tachyon kroma

Updated Nov 29, 2024
C++

alien

chrxh / alien

ALIEN is a CUDA-powered artificial life simulation program.

cuda physics-engine artificial-life agent-based-simulation open-ended-evolution

Updated Nov 28, 2025
C++

rapidsai / cuml

cuML - RAPIDS Machine Learning Library

machine-learning gpu machine-learning-algorithms cuda nvidia

Updated Nov 26, 2025
C++

thrust

NVIDIA / thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

cxx algorithms cpp gpu cpp14 cuda cpp11 nvidia cpp17 gpu-computing thrust cpp20 cxx11 cxx14 cxx17 cxx20 nvidia-hpc-sdk

Updated Feb 8, 2024
C++

arrayfire

arrayfire / arrayfire

ArrayFire: a general purpose GPU library.

c c-plus-plus performance cpp hpc gpu opencl cuda arrayfire gpgpu scientific-computing

Updated Sep 5, 2025
C++

shader-slang / slang

Making it easier to work with shaders

shaders vulkan glsl cuda hlsl d3d12

Updated Nov 30, 2025
C++

OAID / Tengine

Tengine is a lite, high performance, modular inference engine for embedded device

machine-learning arm mips tensorflow x86-64 container cuda acl cnn pytorch artificial-intelligence riscv tensorrt onnx nvdla npu supperedge

Updated Mar 6, 2025
C++

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

real-time deep-learning neural-network gpu rendering cuda pytorch mlp nerf

Updated Nov 25, 2025
C++

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

deep-learning cpp gpu cuda nvidia communications

Updated Nov 10, 2025
C++

ROCm / hip

HIP: C++ Heterogeneous-Compute Interface for Portability

cuda hip hip-runtime hipify hip-kernel-language hip-portability

Updated Nov 27, 2025
C++

OpenNMT / CTranslate2

Fast inference engine for Transformer models

Updated Nov 29, 2025
C++

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

Updated Dec 1, 2025
C++

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

machine-learning compiler runtime tensorflow vulkan cuda pytorch spirv onnx jax mlir

Updated Dec 1, 2025
C++

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

training cuda inference transformer accelerate bart beam-search sampling gpt bert multilingual-nmt diverse-decoding

Updated May 16, 2023
C++

Created by Nvidia

Released June 23, 2007

Followers: 277 followers
Website: github.com/topics/cuda
Wikipedia: Wikipedia

Related topics

nvcc