Computer Vision Hub

Computer Vision
Benchmarks

From classification (ImageNet) to detection (COCO) to segmentation (ADE20K), track the models that see and understand the world.

Key Benchmarks

COCO
Object Detection (80 classes)
ImageNet
Classification (1K classes)
ADE20K
Segmentation (150 classes)
Pascal VOC
Detection (20 classes)

Understanding Computer Vision Metrics

mAP

Mean Average Precision

The gold standard for object detection. Measures how well the model places bounding boxes and classifies objects.

  • AP50: Easy mode. IoU threshold > 50%.
  • AP75: Hard mode. IoU threshold > 75% (tight boxes).
  • mAP (COCO): Average across IoU 0.50 to 0.95.
Used in
COCO, Pascal VOC

Top-1 / Top-5

Classification Accuracy

For image classification. Top-1 is exact match, Top-5 is correct className in top 5 predictions.

  • Top-1: Percentage where the top prediction is correct.
  • Top-5: Percentage where correct className is in top 5.
  • Higher is better: 90% means 90% accuracy.
Used in
ImageNet, CIFAR-10/100

mIoU

Mean Intersection over Union

For semantic segmentation. Measures pixel-level overlap between prediction and ground truth.

  • Pixel-level: Evaluates every pixel in the image.
  • IoU per className: Calculated for each semantic className.
  • Mean: Average IoU across all classes.
Used in
ADE20K, Cityscapes
Image
Dog 0.98
IoU = Area(Overlap) / Area(Union)

Benchmark Categories

Object Detection

Locating and classifying objects with bounding boxes. Higher mAP is better.

RankModelCOCO mAPPascal VOC mAPArchitecture
#1
InternImage-H
Shanghai AI Lab
65.4-Deformable Convolution
#2
Co-DETR (Swin-L)
Research
66.0-Transformer Detector
#3
DINO (Swin-L)
Research
63.3-Transformer Detector
#4
YOLOv10-X
Tsinghua
57.4-CNN (Real-time)
#5
EfficientDet-D7x
Google
55.1-EfficientNet+BiFPN

Image Classification

Categorizing images into predefined classes. Higher accuracy is better.

Coming Soon

ImageNet and CIFAR classification benchmarks will be added soon

Semantic Segmentation

Pixel-level classification of images. Higher mIoU is better.

RankModelADE20K mIoUCityscapes mIoUArchitecture
#1
InternImage-H
Shanghai AI Lab
62.9-Deformable Convolution
#2
Mask2Former (Swin-L)
Meta
57.3-Transformer

Benchmark Datasets

Object Detection

COCO

2014

330K images, 1.5 million object instances, 80 object categories. Standard benchmark for object detection and segmentation.

Task
object-detection
Images
330,000

Pascal VOC 2012

2012

11,530 images with 27,450 ROI annotated objects and 6,929 segmentations. Classic object detection benchmark.

Task
object-detection
Images
11,530

Image Classification

ImageNet-1K

2012

1.28M training images, 50K validation images across 1,000 object classes. The standard benchmark for image classification since 2012.

Task
image-classification
Images
1,281,167

ImageNet-V2

2019

10K new test images following ImageNet collection process. Tests model generalization beyond the original test set.

Task
image-classification
Images
10,000

CIFAR-10

2009

60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.

Task
image-classification
Images
60,000

CIFAR-100

2009

60K 32x32 color images in 100 fine-grained classes grouped into 20 superclasses. More challenging than CIFAR-10.

Task
image-classification
Images
60,000

Semantic Segmentation

Cityscapes

2016

5,000 images with fine annotations and 20,000 with coarse annotations of urban street scenes.

Task
semantic-segmentation
Images
25,000

ADE20K

2016

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

Task
semantic-segmentation
Images
22,210

Explore More Computer Vision Tasks

Beyond object detection, classification, and segmentation, explore benchmarks for scene text detection, document OCR, and more.