Computer Visionimage-segmentation

Semantic Segmentation

Semantic segmentation assigns a class label to every pixel — the dense prediction problem that underpins autonomous driving, medical imaging, and satellite analysis. FCN (2015) showed you could repurpose classifiers for pixel labeling, DeepLab introduced atrous convolutions and CRFs, and SegFormer (2021) proved transformers dominate here too. State-of-the-art on Cityscapes exceeds 85 mIoU, but ADE20K with its 150 classes remains brutally challenging. The frontier has moved toward universal segmentation models like Mask2Former that handle semantic, instance, and panoptic segmentation in a single architecture.

2
Datasets
16
Results
mIoU
Canonical metric
Canonical Benchmark

ADE20K

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

Primary metric: mIoU
View full leaderboard

Top 10

Leading models on ADE20K.

RankModelmIoUYearSource
1
ONE-PEACE
63.02026paper
2
internimage-h
62.92025paper
3
ViT-Adapter-L (BEiT-3)
62.82026paper
4
ViT-CoMer-L
62.12026paper
5
DINOv2 ViT-g/14 + Mask2Former
60.22026paper
6
EVA-02-L + UperNet
60.12026paper
7
EoMT-L (DINOv2)
59.52026paper
8
OneFormer (DiNAT-L)
58.32026paper
9
mask2former-swin-l
57.32025paper
10
Swin-L + UperNet
53.52026paper

All datasets

2 datasets tracked for this task.

Related tasks

Other tasks in Computer Vision.

Run Inference

Looking to run a model? HuggingFace hosts inference for this task type.

HuggingFace