Accelerating YOLOv12 Inference on NVIDIA Devices with TensorRT-YOLO

### **Description**

🚀 **[TensorRT-YOLO](https://github.com/laugh12321/TensorRT-YOLO)** is an **easy-to-use**, **extremely efficient** inference deployment tool for the **YOLO series** designed specifically for NVIDIA devices. The project not only integrates TensorRT plugins to enhance post-processing but also utilizes CUDA kernels and CUDA graphs to accelerate inference. TensorRT-YOLO provides support for both C++ and Python inference, aiming to deliver a 📦**out-of-the-box** deployment experience. It covers various task scenarios such as [object detection](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/detect/), [instance segmentation](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/segment/), [image classification](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/classify/), [pose estimation](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/pose/), [oriented object detection](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/obb/), and [video analysis](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/VideoPipe), meeting developers' deployment needs in **multiple scenarios**.

<div align="center">

<img src='https://github.com/laugh12321/TensorRT-YOLO/raw/main/assets/obb.png' height="138px" width="180px">
<img src='https://github.com/laugh12321/TensorRT-YOLO/raw/main/assets/detect.jpg' height="138px" width="180px">
<img src='https://github.com/laugh12321/TensorRT-YOLO/raw/main/assets/segment.jpg' height="138px" width="180px">
<img src='https://github.com/laugh12321/TensorRT-YOLO/raw/main/assets/pose.jpg' height="138px" width="180px">
<img src='https://github.com/laugh12321/TensorRT-YOLO/raw/main/assets/example.gif' width="730px">

</div>


### **Using YOLOv12 in TensorRT-YOLO**

#### 1. Prerequisites

- **CUDA**: Recommended version ≥ 11.0.1
- **TensorRT**: Recommended version ≥ 8.6.1
- **Operating System**: Linux (x86_64 or ARM) recommended; Windows also supported

#### 2. Installation

- Refer to the [📦 Quick Compilation and Installation](https://github.com/laugh12321/TensorRT-YOLO/blob/main/docs/en/build_and_install.md) documentation for setup instructions.

> [!NOTE]  
> Required `ultralytics >= 8.3.78`

#### 3. Model Export

- Follow the [🔧 Model Export](https://github.com/laugh12321/TensorRT-YOLO/blob/main/docs/en/model_export.md) documentation to export an ONNX model for TensorRT-YOLO inference and convert it into a TensorRT engine.

```bash
trtyolo export -w yolov12n.pt -v ultralytics -o ./ -b 1 -s
trtexec --onnx=yolov12n.onnx --saveEngine=yolov12n.engine
```

#### 4. Inference Example

> [!NOTE]  
> `ClassifyModel`, `DetectModel`, `OBBModel`, `SegmentModel`, and `PoseModel` correspond to models for image classification, detection, oriented bounding box, segmentation, and pose estimation, respectively.

- **Inference using Python:**

```python
import cv2
from tensorrtyolo.infer import InferOption, DetectModel, generatelabels, visualize

def main():
    # Initialization
    option = InferOption()
    option.enableswaprb()  # Convert BGR to RGB
    # Load model
    model = DetectModel(engine_path="yolo12n-with-plugin.engine", option=option)
  
    # Preprocess input image
    input_img = cv2.imread("testimage.jpg")
    if input_img is None:
        raise FileNotFoundError("Failed to load test image.")

    # Inference
    detection_result = model.predict(input_img)
    print(f"==> Detection Result: {detection_result}")

    # Visualization
    class_labels = generate_labels(labelsfile="labels.txt")
    visualized_img = visualize(image=input_img, result=detection_result, labels=class_labels)
    cv2.imwrite("visimage.jpg", visualized_img)

    # Model Cloning Demo
    cloned_model = model.clone()
    cloned_result = cloned_model.predict(input_img)
    print(f"==> Cloned Result: {cloned_result}")

if __name__ == "__main__":
    main()
```

- **Inference using C++:**

```cpp
#include <memory>
#include <opencv2/opencv.hpp>
#include "deploy/model.hpp"
#include "deploy/option.hpp"
#include "deploy/result.hpp"

int main() {
    try {
        // Initialization
        deploy::InferOption option;
        option.enableSwapRB();  // BGR->RGB conversion

        // Load model
        auto detector = std::make_unique<deploy::DetectModel>("yolo12n-with-plugin.engine", option);

        // Load image
        cv::Mat cv_image = cv::imread("testimage.jpg");
        if (cv_image.empty()) {
            throw std::runtime_error("Failed to load test image.");
        }

        // Inference
        deploy::Image input_image(cv_image.data, cv_image.cols, cv_image.rows);
        deploy::DetResult result = detector->predict(input_image);
        std::cout << result << std::endl;

        // Model Cloning Demo
        auto cloned_detector = detector->clone();
        deploy::DetResult cloned_result = cloned_detector->predict(input_image);
        std::cout << cloned_result << std::endl;

    } catch (const std::exception& e) {
        std::cerr << "Exception: " << e.what() << std::endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}
```

## **✨ Key Features**

### 🎯 **Diverse YOLO Support**
- Supports YOLOv3 to YOLOv12 series, as well as PP-YOLOE and PP-YOLOE+ models.
- Easy switching between different YOLO versions.
- Covers applications like [Detect](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/detect/), [Segment](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/segment/), [Classify](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/classify/), [Pose](https://github.com/laugh12321/TensorRT-YOLO/blob/main/examples/pose/), and more.

### 🚀 **Performance Optimization**
- **CUDA Acceleration**: Optimized preprocessing and inference via CUDA kernels and graphs.
- **TensorRT Integration**: Significant performance boost with TensorRT plugin support.
- **Multi-Context Inference**: Parallel inference to maximize hardware utilization.
- **Memory Management Optimization**: Optimized for multi-architecture setups like Jetson with Zero Copy mode.

### 🛠️ **Usability**
- Out-of-the-box support for both C++ and Python.
- CLI tools for quick model export and inference.
- Docker support for seamless environment setup and deployment.
- No third-party dependencies, reducing complexity.

### 🌐 **Compatibility**
- Compatible with multiple platforms: Linux, Windows, ARM, and x86.
- Fully supports TensorRT 10.x versions.

### 🔧 **Flexible Configuration**
- Customizable preprocessing parameters (e.g., **SwapRB**, normalization settings, padding).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerating YOLOv12 Inference on NVIDIA Devices with TensorRT-YOLO #22

Description

Using YOLOv12 in TensorRT-YOLO

1. Prerequisites

2. Installation

3. Model Export

4. Inference Example

✨ Key Features

🎯 Diverse YOLO Support

🚀 Performance Optimization

🛠️ Usability

🌐 Compatibility

🔧 Flexible Configuration

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Accelerating YOLOv12 Inference on NVIDIA Devices with TensorRT-YOLO #22

Description

Description

Using YOLOv12 in TensorRT-YOLO

1. Prerequisites

2. Installation

3. Model Export

4. Inference Example

✨ Key Features

🎯 Diverse YOLO Support

🚀 Performance Optimization

🛠️ Usability

🌐 Compatibility

🔧 Flexible Configuration

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions