Build software better, together

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).

Updated Dec 1, 2025
Python

InternLM / xtuner

Star

A Next-Generation Training Engine Built for Ultra-Large MoE Models

agent reinforcement-learning multimodal llm internvl deepseek-v3 qwen3-moe kimi-k2 gpt-oss intern-s1 qwen3-vl

Updated Nov 28, 2025
Python

2U1 / Qwen-VL-Series-Finetune

Star

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

multimodal vision-language vision-language-model qwen2-vl qwen2-5-vl qwen3-vl

Updated Nov 29, 2025
Python

1038lab / ComfyUI-QwenVL

Sponsor

Star

ComfyUI-QwenVL custom node integrates the Qwen-VL series, including the latest Qwen3-VL models, including Qwen2.5-VL and the latest Qwen3-VL, to enable advanced multimodal AI for text generation, image understanding, and video analysis.

comfyui customnodes qwen-vl qwen3-vl

Updated Nov 29, 2025
Python

sophgo / LLM-TPU

Star

Run generative AI models in sophgo BM1684X/BM1688

large-language-models llm generative-ai llm-inference bm1684x llama3 qwen3 qwen2-5-vl bm1688 internvl3 qwen3-vl

Updated Nov 30, 2025
C++

TurixAI / TuriX-CUA

Star

This is the official website for TuriX Computer-use-Agent

agent mcp cua ai-agents computer-automation computer-use gui-agent browser-use computer-use-agent gui-operator qwen3-vl

Updated Nov 30, 2025
Python

yuanc3 / DATE

Star

Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE

qwen2-5-vl qwen3-vl

Updated Sep 20, 2025
Python

o-l-l-i / simple-captioner

Star

Simple image and video captioning app with a Gradio UI, powered by Qwen2.5/3 VL Instruct.

python vision gradio llm qwen2-5-vl qwen3-vl

Updated Oct 15, 2025
Python

PRITHIVSAKTHIUR / Qwen3-VL-Outpost

Star

Qwen3-VL-Outpost is a Gradio-based web application for vision-language tasks, leveraging multiple Qwen vision-language models to process images and videos.

torch gradio opencv-python video-understanding huggingface-transformers huggingface-spaces vision-language-model qwen2-vl qwen2-5-vl qwen3-vl

Updated Nov 1, 2025
Python

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast

Star

Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless image editing tasks.

python kernel numpy torch pytorch peft torchvision diffusion-models huggingface-transformers huggingface-spaces diffusers flash-attention-3 qwen2-5-vl qwen-image-edit qwen3-vl qwen-image-edit-2509 aoti

Updated Nov 24, 2025
Python

PRITHIVSAKTHIUR / Qwen-3VL-Multimodal-Understanding

Star

Qwen3-VL-4B-Instruct model from Alibaba's Qwen series for multimodal tasks involving images and text. It enables users to upload an image and perform various vision-language tasks, such as querying details, generating captions, detecting points of interest.

torch pytorch pip accelerate supervision gradio multimodal torchvision huggingface-transformers roboflow huggingface-spaces vision-language-model pillow-library llama-cpp qwen2-5-vl qwen3-vl

Updated Nov 18, 2025
Python

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

Star

Qwen-Image-Edit-2509-LoRAs-Fast-Fusion is a fast, interactive web application built with Gradio that enables advanced image editing using the Qwen/Qwen-Image-Edit-2509 model from Alibaba's Qwen team. It leverages specialized LoRA adapters for efficient, low-step inference (as few as 4 steps).

Updated Nov 24, 2025
Python

PRITHIVSAKTHIUR / Multimodal-OCR3

Star

Multimodal-OCR3 is an advanced Optical Character Recognition (OCR) application that leverages multiple state-of-the-art multimodal models to extract text from images.

ocr pillow pytorch matplotlib ocr-recognition nanonets inference-optimization huggingface-transformers vision-transformer huggingface-models sota-model huggingface-spaces vision-language-model multimodal-large-language-models qwen2-5-vl qwen3-vl chandra-ocr dotsocr olmocr2

Updated Nov 11, 2025
Python

PRITHIVSAKTHIUR / Qwen3-VL-HF-Demo

Star

The demo of Qwen3-VL-30B-A3B-Instruct, the next-generation and powerful vision-language model in the Qwen series, delivers comprehensive upgrades across the board — including superior text understanding and generation, deeper visual perception and reasoning, extended context length, enhanced spatial and video dynamics comprehensions.