-
AI Frameworks Engineer @intel
- SH
-
16:17
(UTC +08:00)
-
-
vllm-fork Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
-
llm-compressor-fork Public
Forked from vllm-project/llm-compressorTransformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
-
-
vllm-gaudi Public
Forked from vllm-project/vllm-gaudiCommunity maintained hardware plugin for vLLM on Intel Gaudi
Python UpdatedNov 25, 2025 -
gpucodes Public
Forked from jalexine/gpucodescodes documenting my gpu learning journey
Cuda UpdatedNov 19, 2025 -
auto-round-fork Public
Forked from intel/auto-roundSOTA Weight-only Quantization Algorithm for LLMs
Python Apache License 2.0 UpdatedNov 4, 2025 -
oneAPI-samples-fork Public
Forked from oneapi-src/oneAPI-samplesSamples for Intel® oneAPI Toolkits
C++ MIT License UpdatedNov 1, 2025 -
sglang-fork Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedOct 24, 2025 -
native-sparse-attention-fork Public
Forked from fla-org/native-sparse-attention🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Python MIT License UpdatedOct 2, 2025 -
compressed-tensors-fork Public
Forked from vllm-project/compressed-tensorsA safetensors extension to efficiently store sparse quantized tensors on disk
-
triton-fork Public
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
MLIR MIT License UpdatedSep 19, 2025 -
-
lm-eval-fork Public
Forked from EleutherAI/lm-evaluation-harnessA framework for few-shot evaluation of language models.
Python MIT License UpdatedSep 4, 2025 -
LongCat-Flash-Chat Public
Forked from meituan-longcat/LongCat-Flash-ChatMIT License UpdatedSep 1, 2025 -
torchao-fork Public
Forked from pytorch/aoThe torchao repository contains api's and workflows for quantization and pruning gpu models.
Python Other UpdatedAug 22, 2025 -
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedAug 6, 2025 -
-
-
SageAttention-Fork Public
Forked from thu-ml/SageAttentionQuantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Cuda Apache License 2.0 UpdatedJul 16, 2025 -
vllm-hpu-extension-fork Public
Forked from HabanaAI/vllm-hpu-extensionPython Apache License 2.0 UpdatedJul 1, 2025 -
microxcaling-fork Public
Forked from microsoft/microxcalingPyTorch emulation library for Microscaling (MX)-compatible data formats
Python MIT License UpdatedJun 18, 2025 -
-
-
flashinfer-fork Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedMar 11, 2025 -
-
optimum-habana Public
Forked from huggingface/optimum-habanaEasy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Python Apache License 2.0 UpdatedDec 24, 2024 -
torch-xpu-ops-fork Public
Forked from intel/torch-xpu-opsC++ Apache License 2.0 UpdatedDec 17, 2024 -
pytorch-fork Public
Forked from pytorch/pytorchTensors and Dynamic neural networks in Python with strong GPU acceleration
Python Other UpdatedDec 4, 2024 -
intel-extension-for-pytorch Public
Forked from intel/intel-extension-for-pytorchA Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Python Apache License 2.0 UpdatedDec 3, 2024




