ImageNet-1K

Unknown

1.28M training images, 50K validation images across 1,000 object classes. The standard benchmark for image classification since 2012.

Benchmark Stats

Models16
Papers16
Metrics1

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

top-1-accuracy

top-1-accuracy

Higher is better

RankModelCodeScorePaper / Source
1coca-finetuned

Current SOTA on ImageNet-1K. 2.1B parameters. Contrastive Captioner architecture.

-91google-research
2vit-g-14

Giant ViT variant. 1.8B parameters.

90.45google-research
3convnext-v2-huge

Best pure ConvNet. 650M parameters. Trained with FCMAE.

88.9meta-research
4vit-h-14

Huge ViT variant. 632M parameters.

88.55google-research
5swin-large

Hierarchical Vision Transformer with shifted windows.

87.3microsoft-research
6efficientnet-v2-l

Pretrained on ImageNet-21K, fine-tuned on 1K.

85.7google-research
7deit-b-distilled

Data-efficient ViT with distillation. Trained on ImageNet-1K only.

85.2meta-research
8efficientnet-b7

8.4x smaller than GPipe. 66M parameters.

84.4google-research
9deit-b

Without distillation. Trained from scratch on ImageNet-1K.

83.1meta-research
10convnext-v2-tiny

28M parameters. Efficient variant.

83meta-research
11vit-l-16

Large ViT with ImageNet-21K pretraining.

82.7google-research
12vit-b-16

Base ViT with ImageNet-21K pretraining.

81.2google-research
13resnet-50-a3

ResNet Strikes Back. Modern training recipe on classic architecture.

80.4timm-research
14resnet-152

10-crop evaluation. Original deep residual network.

78.6microsoft-research
15efficientnet-b0

Only 5.3M parameters. Baseline for compound scaling.

77.1google-research
16resnet-50

Standard torchvision baseline. 25M parameters.

76.15pytorch-vision