Skip to content

Accelerating YOLOv12 Inference on NVIDIA Devices with TensorRT-YOLO #22

@laugh12321

Description

@laugh12321

Description

🚀 TensorRT-YOLO is an easy-to-use, extremely efficient inference deployment tool for the YOLO series designed specifically for NVIDIA devices. The project not only integrates TensorRT plugins to enhance post-processing but also utilizes CUDA kernels and CUDA graphs to accelerate inference. TensorRT-YOLO provides support for both C++ and Python inference, aiming to deliver a 📦out-of-the-box deployment experience. It covers various task scenarios such as object detection, instance segmentation, image classification, pose estimation, oriented object detection, and video analysis, meeting developers' deployment needs in multiple scenarios.

Using YOLOv12 in TensorRT-YOLO

1. Prerequisites

  • CUDA: Recommended version ≥ 11.0.1
  • TensorRT: Recommended version ≥ 8.6.1
  • Operating System: Linux (x86_64 or ARM) recommended; Windows also supported

2. Installation

Note

Required ultralytics >= 8.3.78

3. Model Export

  • Follow the 🔧 Model Export documentation to export an ONNX model for TensorRT-YOLO inference and convert it into a TensorRT engine.
trtyolo export -w yolov12n.pt -v ultralytics -o ./ -b 1 -s
trtexec --onnx=yolov12n.onnx --saveEngine=yolov12n.engine

4. Inference Example

Note

ClassifyModel, DetectModel, OBBModel, SegmentModel, and PoseModel correspond to models for image classification, detection, oriented bounding box, segmentation, and pose estimation, respectively.

  • Inference using Python:
import cv2
from tensorrtyolo.infer import InferOption, DetectModel, generatelabels, visualize

def main():
    # Initialization
    option = InferOption()
    option.enableswaprb()  # Convert BGR to RGB
    # Load model
    model = DetectModel(engine_path="yolo12n-with-plugin.engine", option=option)
  
    # Preprocess input image
    input_img = cv2.imread("testimage.jpg")
    if input_img is None:
        raise FileNotFoundError("Failed to load test image.")

    # Inference
    detection_result = model.predict(input_img)
    print(f"==> Detection Result: {detection_result}")

    # Visualization
    class_labels = generate_labels(labelsfile="labels.txt")
    visualized_img = visualize(image=input_img, result=detection_result, labels=class_labels)
    cv2.imwrite("visimage.jpg", visualized_img)

    # Model Cloning Demo
    cloned_model = model.clone()
    cloned_result = cloned_model.predict(input_img)
    print(f"==> Cloned Result: {cloned_result}")

if __name__ == "__main__":
    main()
  • Inference using C++:
#include <memory>
#include <opencv2/opencv.hpp>
#include "deploy/model.hpp"
#include "deploy/option.hpp"
#include "deploy/result.hpp"

int main() {
    try {
        // Initialization
        deploy::InferOption option;
        option.enableSwapRB();  // BGR->RGB conversion

        // Load model
        auto detector = std::make_unique<deploy::DetectModel>("yolo12n-with-plugin.engine", option);

        // Load image
        cv::Mat cv_image = cv::imread("testimage.jpg");
        if (cv_image.empty()) {
            throw std::runtime_error("Failed to load test image.");
        }

        // Inference
        deploy::Image input_image(cv_image.data, cv_image.cols, cv_image.rows);
        deploy::DetResult result = detector->predict(input_image);
        std::cout << result << std::endl;

        // Model Cloning Demo
        auto cloned_detector = detector->clone();
        deploy::DetResult cloned_result = cloned_detector->predict(input_image);
        std::cout << cloned_result << std::endl;

    } catch (const std::exception& e) {
        std::cerr << "Exception: " << e.what() << std::endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

✨ Key Features

🎯 Diverse YOLO Support

  • Supports YOLOv3 to YOLOv12 series, as well as PP-YOLOE and PP-YOLOE+ models.
  • Easy switching between different YOLO versions.
  • Covers applications like Detect, Segment, Classify, Pose, and more.

🚀 Performance Optimization

  • CUDA Acceleration: Optimized preprocessing and inference via CUDA kernels and graphs.
  • TensorRT Integration: Significant performance boost with TensorRT plugin support.
  • Multi-Context Inference: Parallel inference to maximize hardware utilization.
  • Memory Management Optimization: Optimized for multi-architecture setups like Jetson with Zero Copy mode.

🛠️ Usability

  • Out-of-the-box support for both C++ and Python.
  • CLI tools for quick model export and inference.
  • Docker support for seamless environment setup and deployment.
  • No third-party dependencies, reducing complexity.

🌐 Compatibility

  • Compatible with multiple platforms: Linux, Windows, ARM, and x86.
  • Fully supports TensorRT 10.x versions.

🔧 Flexible Configuration

  • Customizable preprocessing parameters (e.g., SwapRB, normalization settings, padding).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions