Model card
Mask2Former (Swin-L).
Meta AI / UIUCopen-sourceTransformer
Universal image segmentation model. On LVIS v1.0 minival achieves 56.1 mask AP. Uses masked attention in Transformer decoder to focus on predicted foreground regions. CVPR 2022. arXiv:2112.01527.
§ 01 · Benchmarks
Every benchmark Mask2Former (Swin-L) has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | LVIS v1.0 | Computer Vision · Object Detection | mask-ap-rare | 53.5% | #3 | 2021-12-02 | source ↗ |
| 02 | ADE20K | Computer Vision · Semantic Segmentation | mIoU | 57.3% | #4 | — | source ↗ |
| 03 | LVIS v1.0 | Computer Vision · Object Detection | mask-ap | 56.1% | #6 | 2021-12-02 | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area
Where Mask2Former (Swin-L) actually performs.
§ 03 · Papers
1 paper with results for Mask2Former (Swin-L).
- 2021-12-02· Computer Vision· 2 results
Masked-attention Mask Transformer for Universal Image Segmentation
§ 05 · Sources & freshness
Where these numbers come from.
arxiv
2
results
arxiv-paper
1
result
2 of 3 rows marked verified.