Codesota · Models · ViTDet-H (MAE)Meta AI2 results · 1 benchmarks
Model card

ViTDet-H (MAE).

Meta AIopen-sourceUnknown paramsPlain ViT-H backbone with simple feature pyramid, Cascade Mask RCNN head

Explores plain non-hierarchical ViT backbones for object detection without FPN modifications. ViT-Huge variant achieves 59.5 mask AP on LVIS v1.0 minival with MAE pretraining. ECCV 2022. arXiv:2203.16527.

§ 01 · Benchmarks

Every benchmark ViTDet-H (MAE) has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01LVIS v1.0Computer Vision · Object Detectionbox-ap64.0%#4/42022-03-30source ↗
02LVIS v1.0Computer Vision · Object Detectionmask-ap59.5%#5/92022-03-30source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where ViTDet-H (MAE) actually performs.

Computer Vision
1
benchmark
avg rank #4.5
§ 03 · Papers

1 paper with results for ViTDet-H (MAE).

  1. 2022-03-30· Computer Vision· 2 results

    Exploring Plain Vision Transformer Backbones for Object Detection

§ 04 · Related models

Other Meta AI models scored on Codesota.

GENRE
1 result · 1 SOTA
SeamlessM4T v2 Large
2.3B params · 1 result · 1 SOTA
DINOv2 (ViT-g) + Linear
Unknown params · 1 result
Fairseq S2T (MuST-C)
~150M params · 1 result
Mask2Former (Swin-L)
Unknown params · 1 result
MusicGen Large
3.3B params · 1 result
Voicebox
330M params · 1 result
convnext_base.fb_in22k_ft_in1k
1 result
§ 05 · Sources & freshness

Where these numbers come from.

arxiv
2
results
2 of 2 rows marked verified.