Computer Visionobject-detection

Object Detection

Object detection — finding what's in an image and where — is the backbone of autonomous vehicles, surveillance, and robotics. The two-stage R-CNN lineage (2014–2017) gave way to single-shot detectors like YOLO, now in its 11th iteration and still getting faster. DETR (2020) proved transformers could replace hand-designed components like NMS entirely, spawning a family of end-to-end detectors that dominate COCO leaderboards above 60 mAP. The field's current obsession: open-vocabulary detection that works on any object described in natural language, not just fixed categories.

3
Datasets
17
Results
mAP
Canonical metric
Canonical Benchmark

COCO

330K images, 1.5 million object instances, 80 object categories. Standard benchmark for object detection and segmentation.

Primary metric: mAP
View full leaderboard

Top 10

Leading models on COCO.

RankModelmAPYearSource
1
co-detr-swin-l
66.02025paper
2
internimage-h
65.42025paper
3
Focal-Stable-DINO
64.62023paper
4
dino-swin-l
63.32025paper
5
EVA-02-L
62.32023paper
6
RF-DETR-2XL
60.12024paper
7
D-FINE-X (Objects365)
59.32026paper
8
yolov10-x
57.42025paper
9
RT-DETRv4-X
57.02025paper
10
DINO-X Pro
56.02024paper

All datasets

3 datasets tracked for this task.

Related tasks

Other tasks in Computer Vision.

Run Inference

Looking to run a model? HuggingFace hosts inference for this task type.

HuggingFace