Model card
ViTDet-H (MAE).
Meta AIopen-sourceUnknown paramsPlain ViT-H backbone with simple feature pyramid, Cascade Mask RCNN head
Explores plain non-hierarchical ViT backbones for object detection without FPN modifications. ViT-Huge variant achieves 59.5 mask AP on LVIS v1.0 minival with MAE pretraining. ECCV 2022. arXiv:2203.16527.
§ 01 · Benchmarks
Every benchmark ViTDet-H (MAE) has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | LVIS v1.0 | Computer Vision · Object Detection | box-ap | 64.0% | #4 | 2022-03-30 | source ↗ |
| 02 | LVIS v1.0 | Computer Vision · Object Detection | mask-ap | 59.5% | #5 | 2022-03-30 | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area
Where ViTDet-H (MAE) actually performs.
§ 03 · Papers
1 paper with results for ViTDet-H (MAE).
- 2022-03-30· Computer Vision· 2 results
Exploring Plain Vision Transformer Backbones for Object Detection
§ 04 · Related models
Other Meta AI models scored on Codesota.
GENRE
1 result · 1 SOTA
SeamlessM4T v2 Large
2.3B params · 1 result · 1 SOTA
DINOv2 (ViT-g) + Linear
Unknown params · 1 result
Fairseq S2T (MuST-C)
~150M params · 1 result
Mask2Former (Swin-L)
Unknown params · 1 result
MusicGen Large
3.3B params · 1 result
Voicebox
330M params · 1 result
convnext_base.fb_in22k_ft_in1k
1 result
§ 05 · Sources & freshness
Where these numbers come from.
arxiv
2
results
2 of 2 rows marked verified.