Stars
Flash-Muon: An Efficient Implementation of Muon Optimizer
Frontier Models playing the board game Diplomacy.
Community-contributed instructions, prompts, and configurations to help you make the most of GitHub Copilot.
Simple, Elegant, Typed Argument Parsing with argparse
Typer, build great CLIs. Easy to code. Based on Python type hints.
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
A collection of memory efficient attention operators implemented in the Triton language.
FlagGems is an operator library for large language models implemented in the Triton Language.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
A collection of GPT system prompts and various prompt injection/leaking knowledge.
Automatically create Faiss knn indices with the most optimal similarity search parameters.
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Experiment of using Tangent to autodiff triton
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Machine Learning Engineering Open Book
Command-line sampling profiler for macOS, Linux, and Windows
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
🛋 The AI and Generative Art platform for everyone
Multipack distributed sampler for fast padding-free training of LLMs
Accessible large language models via k-bit quantization for PyTorch.
A guidance language for controlling large language models.