Semantic Segmentation2016en

ADE20K Scene Parsing Benchmark

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

Samples:22,210

Metrics:mIoU, pixel-accuracy

Paper / Website Download

Current State of the Art

InternImage-H

Shanghai AI Lab

62.9

mIoU

ADE20K — mIoU

6 results · 1 SOTA advances · higher is better

All results

SOTA frontier

mIoU Progress Over Time

Showing 4 breakthroughs from Mar 2021 to Nov 2022

Key Milestones

Mar 2021

Swin-L + UperNet

Swin-L + UperNet. ADE20K val. ICCV 2021 Best Paper.

53.5

Source

Dec 2021

Mask2Former (Swin-L)

Mask2Former Swin-L. ADE20K val. Unified segmentation architecture. CVPR 2022.

57.3

+7.1%

Source

Aug 2022

BEiT-3 (ViT-L)

BEiT-3 ViT-L. ADE20K val. Multimodal foundation model. CVPR 2023.

62.8

+9.6%

Source

Nov 2022

InternImage-HCurrent SOTA

Semantic segmentation SOTA.

62.9

+0.2%

Source

Total Improvement

17.6%

Time Span

1y 8m

Breakthroughs

Current SOTA

62.9

Top Models Performance Comparison

Top 6 models ranked by mIoU

Best Score

62.9

Top Model

InternImage-H

Models Compared

Score Range

9.4

mIoUPrimary

#	Model	Score	Paper / Code	Date
1	InternImage-HOpen Source Shanghai AI Lab	62.9	arxiv-paper	Dec 2025
2	BEiT-3 (ViT-L)Open Source Microsoft	62.8	arxiv	Mar 2026
3	DINOv2 (ViT-g) + LinearOpen Source Meta AI	62	arxiv	Mar 2026
4	Mask2Former (Swin-L)Open Source Meta AI	57.3	arxiv	Mar 2026
5	Mask2Former (Swin-L)Open Source Meta AI / UIUC	57.3	arxiv-paper	Dec 2025
6	Swin-L + UperNetOpen Source Microsoft	53.5	arxiv	Mar 2026

Other Semantic Segmentation Datasets

Cityscapes