Codesota · Computer VisionCOCO · ImageNet · ADE20K · Pascal VOCBenchmarks hub

Computer vision,
measured in pixels.

From classification on ImageNet to detection on COCO to segmentation on ADE20K — the models that see, and what they see well. Every score dated, every metric defined, every dataset linked.

Descriptions in serif; scores in tabular mono; navigation in sans.

View benchmarks ↓Understanding metrics Explore all tasks

§ 01 · Metrics

Reading the numbers.

Three families of metric cover nearly every computer-vision leaderboard. Each asks a different question of the model — localization, classification, or pixel-level agreement.

Mean Average Precision

mAP

The gold standard for object detection. Measures how well the model places bounding boxes and classifies objects.

AP50: Easy mode. IoU threshold > 50%.
AP75: Hard mode. IoU threshold > 75% (tight boxes).
mAP (COCO): Average across IoU 0.50 to 0.95.

Used in — COCO, Pascal VOC

Classification Accuracy

Top-1 / Top-5

For image classification. Top-1 is exact match, Top-5 is correct class in top five predictions.

Top-1: Percentage where the top prediction is correct.
Top-5: Percentage where correct class is in top 5.
Higher is better: 90% means 90% accuracy.

Used in — ImageNet, CIFAR-10/100

Mean Intersection over Union

mIoU

For semantic segmentation. Measures pixel-level overlap between prediction and ground truth.

Pixel-level: Evaluates every pixel in the image.
IoU per class: Calculated for each semantic class.
Mean: Average IoU across all classes.

Used in — ADE20K, Cityscapes

Image

Dog 0.98

IoU = Area(Overlap) / Area(Union)

§ 02 · Coverage

Three task families.

Detection, classification, and segmentation — the backbone tasks of modern computer vision. Each card links to its leaderboard below.

Task family

Object Detection →

Locating and classifying objects with bounding boxes. COCO and Pascal VOC benchmarks.

Metric: mAP (Mean Average Precision)

Task family

Image Classification →

Categorizing images into predefined classes. ImageNet and CIFAR benchmarks.

Metric: Top-1 / Top-5 Accuracy

Task family

Semantic Segmentation →

Pixel-level classification of images. ADE20K and Cityscapes benchmarks.

Metric: mIoU (Mean IoU)

§ 03 · Detection

Object detection.

Locating and classifying objects with bounding boxes. Higher mAP is better. Shaded row marks current state of the art on COCO.

#	Model	Vendor	COCO mAP	Pascal VOC mAP	Architecture
01	InternImage-H	Shanghai AI Lab	65.4	—	Deformable Convolution
02	Co-DETR (Swin-L)	Research	66.0	—	Transformer Detector
03	DINO (Swin-L)	Research	63.3	—	Transformer Detector
04	YOLOv10-X	Tsinghua	57.4	—	CNN (Real-time)
05	EfficientDet-D7x	Google	55.1	—	EfficientNet+BiFPN

Fig 2 · Em-dash means no result on file for that model × dataset pair — not evidence of weakness.

§ 04 · Classification

Image classification.

Categorizing images into predefined classes. Higher accuracy is better.

Coming soon

ImageNet and CIFAR classification benchmarks will be added soon.

§ 05 · Segmentation

Semantic segmentation.

Pixel-level classification of images. Higher mIoU is better.

#	Model	Vendor	ADE20K mIoU	Cityscapes mIoU	Architecture
01	InternImage-H	Shanghai AI Lab	62.9	—	Deformable Convolution
02	Mask2Former (Swin-L)	Meta	57.3	—	Transformer

Fig 3 · Em-dash means no result on file for that model × dataset pair.

§ 06 · Datasets

The benchmarks.

Every canonical computer-vision dataset, grouped by task. Click through for the paper or the dataset download.

Object detection

COCO

2014

330K images, 1.5 million object instances, 80 object categories. Standard benchmark for object detection and segmentation.

Task: object-detection
Images: 330,000

Paper →Dataset →

Pascal VOC 2012

2012

11,530 images with 27,450 ROI annotated objects and 6,929 segmentations. Classic object detection benchmark.

Task: object-detection
Images: 11,530

Paper →Dataset →

Image classification

ImageNet-1K

2012

1.28M training images, 50K validation images across 1,000 object classes. The standard benchmark for image classification since 2012.

Task: image-classification
Images: 1,281,167

Paper →Dataset →

ImageNet Linear Probe

2012

Linear classification on frozen ImageNet-1K features. Used to evaluate representation quality of self-supervised and contrastive models without fine-tuning the backbone.

Task: image-classification
Images: 1,281,167

Paper →

ImageNet-V2

2019

10K new test images following ImageNet collection process. Tests model generalization beyond the original test set.

Task: image-classification
Images: 10,000

Paper →

CIFAR-10

2009

60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.

Task: image-classification
Images: 60,000

Paper →Dataset →

Semantic segmentation

Cityscapes

2016

5,000 images with fine annotations and 20,000 with coarse annotations of urban street scenes.

Task: semantic-segmentation
Images: 25,000

Paper →Dataset →

ADE20K

2016

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

Task: semantic-segmentation
Images: 22,210

Paper →Dataset →

§ 07 · Related

Keep exploring.

Beyond detection, classification, and segmentation — adjacent sections of the vision registry.

Section

All computer-vision tasks →

Scene text, OCR, depth, optical flow — the full index.

Section

All modalities →

Text, vision, audio, agentic and multimodal combined.

Section

Vision hub →

Editorial overview and news across computer vision.

Submit a result ↵Back to Computer Vision

Computer vision,measured in pixels.

Reading the numbers.

Three task families.

Object detection.

Image classification.

Semantic segmentation.

The benchmarks.

Keep exploring.

Computer vision,
measured in pixels.