Semantic Segmentation2016en

ADE20K Scene Parsing Benchmark

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

Samples:22,210
Metrics:mIoU, pixel-accuracy
Paper / WebsiteDownload
Current State of the Art

InternImage-H

Shanghai AI Lab

62.9

mIoU

ADE20K — mIoU

6 results · 1 SOTA advances · higher is better

All results
SOTA frontier
5560202520262027mIoUInternImage-H

mIoU Progress Over Time

Showing 4 breakthroughs from Mar 2021 to Nov 2022

52.655.458.261.063.8Mar 2021Sep 2021Apr 2022Nov 2022mIoUDate

Key Milestones

Mar 2021
Swin-L + UperNet

Swin-L + UperNet. ADE20K val. ICCV 2021 Best Paper.

53.5
Dec 2021
Mask2Former (Swin-L)

Mask2Former Swin-L. ADE20K val. Unified segmentation architecture. CVPR 2022.

57.3
+7.1%
Aug 2022
BEiT-3 (ViT-L)

BEiT-3 ViT-L. ADE20K val. Multimodal foundation model. CVPR 2023.

62.8
+9.6%
Nov 2022
InternImage-HCurrent SOTA

Semantic segmentation SOTA.

62.9
+0.2%
Total Improvement
17.6%
Time Span
1y 8m
Breakthroughs
4
Current SOTA
62.9

Top Models Performance Comparison

Top 6 models ranked by mIoU

mIoU1InternImage-H62.9100.0%2BEiT-3 (ViT-L)62.899.8%3DINOv2 (ViT-g) + Linear62.098.6%4Mask2Former (Swin-L)57.391.1%5Mask2Former (Swin-L)57.391.1%6Swin-L + UperNet53.585.1%0%25%50%75%100%% of best
Best Score
62.9
Top Model
InternImage-H
Models Compared
6
Score Range
9.4

mIoUPrimary

#ModelScorePaper / CodeDate
1
InternImage-HOpen Source
Shanghai AI Lab
62.9Dec 2025
2
BEiT-3 (ViT-L)Open Source
Microsoft
62.8Mar 2026
3
DINOv2 (ViT-g) + LinearOpen Source
Meta AI
62Mar 2026
4
Mask2Former (Swin-L)Open Source
Meta AI
57.3Mar 2026
5
Mask2Former (Swin-L)Open Source
Meta AI / UIUC
57.3Dec 2025
6
Swin-L + UperNetOpen Source
Microsoft
53.5Mar 2026

Other Semantic Segmentation Datasets