imagenet-1k
Unknown
OCR benchmark
top-1-accuracy
Higher is better
| Rank | Model | Score | Source |
|---|---|---|---|
| 1 | coca-finetuned Current SOTA on ImageNet-1K. 2.1B parameters. Contrastive Captioner architecture. | 91 | google-research |
| 2 | vit-g-14 Giant ViT variant. 1.8B parameters. | 90.45 | google-research |
| 3 | convnext-v2-huge Best pure ConvNet. 650M parameters. Trained with FCMAE. | 88.9 | meta-research |
| 4 | vit-h-14 Huge ViT variant. 632M parameters. | 88.55 | google-research |
| 5 | swin-large Hierarchical Vision Transformer with shifted windows. | 87.3 | microsoft-research |
| 6 | efficientnet-v2-l Pretrained on ImageNet-21K, fine-tuned on 1K. | 85.7 | google-research |
| 7 | deit-b-distilled Data-efficient ViT with distillation. Trained on ImageNet-1K only. | 85.2 | meta-research |
| 8 | efficientnet-b7 8.4x smaller than GPipe. 66M parameters. | 84.4 | google-research |
| 9 | deit-b Without distillation. Trained from scratch on ImageNet-1K. | 83.1 | meta-research |
| 10 | convnext-v2-tiny 28M parameters. Efficient variant. | 83 | meta-research |
| 11 | vit-l-16 Large ViT with ImageNet-21K pretraining. | 82.7 | google-research |
| 12 | vit-b-16 Base ViT with ImageNet-21K pretraining. | 81.2 | google-research |
| 13 | resnet-50-a3 ResNet Strikes Back. Modern training recipe on classic architecture. | 80.4 | timm-research |
| 14 | resnet-152 10-crop evaluation. Original deep residual network. | 78.6 | microsoft-research |
| 15 | efficientnet-b0 Only 5.3M parameters. Baseline for compound scaling. | 77.1 | google-research |
| 16 | resnet-50 Standard torchvision baseline. 25M parameters. | 76.15 | pytorch-vision |