Home / OCR / Benchmarks / imagenet-1k

imagenet-1k

Unknown

OCR benchmark

16
Total Results
16
Models Tested
1
Metrics
2025-12-19
Last Updated

top-1-accuracy

Higher is better

Rank Model Score Source
1 coca-finetuned

Current SOTA on ImageNet-1K. 2.1B parameters. Contrastive Captioner architecture.

91 google-research
2 vit-g-14

Giant ViT variant. 1.8B parameters.

90.45 google-research
3 convnext-v2-huge

Best pure ConvNet. 650M parameters. Trained with FCMAE.

88.9 meta-research
4 vit-h-14

Huge ViT variant. 632M parameters.

88.55 google-research
5 swin-large

Hierarchical Vision Transformer with shifted windows.

87.3 microsoft-research
6 efficientnet-v2-l

Pretrained on ImageNet-21K, fine-tuned on 1K.

85.7 google-research
7 deit-b-distilled

Data-efficient ViT with distillation. Trained on ImageNet-1K only.

85.2 meta-research
8 efficientnet-b7

8.4x smaller than GPipe. 66M parameters.

84.4 google-research
9 deit-b

Without distillation. Trained from scratch on ImageNet-1K.

83.1 meta-research
10 convnext-v2-tiny

28M parameters. Efficient variant.

83 meta-research
11 vit-l-16

Large ViT with ImageNet-21K pretraining.

82.7 google-research
12 vit-b-16

Base ViT with ImageNet-21K pretraining.

81.2 google-research
13 resnet-50-a3

ResNet Strikes Back. Modern training recipe on classic architecture.

80.4 timm-research
14 resnet-152

10-crop evaluation. Original deep residual network.

78.6 microsoft-research
15 efficientnet-b0

Only 5.3M parameters. Baseline for compound scaling.

77.1 google-research
16 resnet-50

Standard torchvision baseline. 25M parameters.

76.15 pytorch-vision