YOLOv12 is a new, state-of-the-art object detection model. The model uses an attention-based YOLO implementation that "matches the speed of previous CNN-based ones while harnessing the performance benefits of
attention mechanisms" according to the paper accompanying the model.
You can run YOLO12 models on a NVIDIA Jetson, NVIDIA GPUs, and macOS systems with Roboflow Inference, an open source Python package for running vision models.
To learn more about the architecture of the model, refer to the YOLOv12 paper.
You can train a YOLOv12 model using the YOLOv12 Python package.
To train a model, install the project from soucre:
Then, use the following code to train your model:
Replace data with the name of your YOLO12-formatted dataset. Learn more about the YOLO12 format.
You can then test your model on images in your test dataset with the following command:
YOLOv12 uses an attention-based implementation of YOLO that "matches the speed" compared to previous CNN-based YOLO implementations. The YOLOv12 model architecture is able to achieve behighertter accuracy when validated against the Microsoft COCO benchmark.
YOLOv12 was developed by Yunjie Tian (University of Buffalo, SUNY), Qixiang Ye, (University of Chinese Academy of Sciences) and David Doermann (University of Buffalo, SUNY).