What are you trying to extract?

Pick your document type. See what actually works.

EXCLUSIVE

We Run Our Own Benchmarks

No vendor claims. Real results. Independently verified.

While others copy numbers from marketing pages, we run the actual benchmarks ourselves. Full datasets. Official evaluation tools. Reproducible results.

1,355 images processed
$2.71 benchmark cost
100% reproducible

Open Source OCR Benchmark

Run on your own servers. No API costs. Full data privacy.

Model OmniDocBench OCRBench (EN) olmOCR License
PaddleOCR-VL
Baidu
92.86 - 80.0 Apache 2.0
PaddleOCR-VL 0.9B
Baidu
92.56 - - Apache 2.0
MinerU 2.5
OpenDataLab
90.67 - 75.2 AGPL-3.0
Qwen3-VL-235B
Alibaba
89.15 - - Qwen License
MonkeyOCR-pro-3B
Unknown
88.85 - - Apache 2.0 / MIT
OCRVerse 4B
Unknown
88.56 - - Apache 2.0 / MIT
dots.ocr 3B
Unknown
88.41 - 79.1 Apache 2.0 / MIT
Qwen2.5-VL
Alibaba
87.02 - - Apache 2.0
Chandra v0.1.0
datalab-to
- - 83.1 Apache 2.0
Infinity-Parser 7B
Unknown
- - 82.5 Apache 2.0 / MIT
olmOCR v0.4.0
Allen AI
- - 82.4 Apache 2.0
Marker 1.10.0
VikParuchuri
- - 76.5 Apache 2.0 / MIT
Marker 1.10.1
VikParuchuri
- - 76.1 Apache 2.0 / MIT
DeepSeek OCR
DeepSeek
- - 75.4 Apache 2.0 / MIT
GPT-4o (Anchored)
OpenAI
- - 69.9 Apache 2.0 / MIT
Nanonets OCR2 3B
Nanonets
- - 69.5 Apache 2.0 / MIT
Gemini Flash 2
Google
- - 63.8 Apache 2.0 / MIT
Qwen3-Omni-30B
Alibaba
- 61.3% - Qwen License
Nemotron Nano V2 VL
NVIDIA
- 61.2% - NVIDIA Open Model License
CoCa (finetuned)
Google
- - - Apache 2.0
ViT-G/14
Google
- - - Apache 2.0
ViT-H/14
Google
- - - Apache 2.0
ViT-L/16
Google
- - - Apache 2.0
ViT-B/16
Google
- - - Apache 2.0
ConvNeXt V2 Huge
Meta
- - - MIT
ConvNeXt V2 Base
Meta
- - - MIT
ConvNeXt V2 Tiny
Meta
- - - MIT
Swin Transformer V2 Large
Microsoft
- - - MIT
Swin Transformer Large
Microsoft
- - - MIT
EfficientNetV2-L
Google
- - - Apache 2.0
EfficientNet-B7
Google
- - - Apache 2.0
EfficientNet-B0
Google
- - - Apache 2.0
DeiT-B Distilled
Meta
- - - Apache 2.0
DeiT-B
Meta
- - - Apache 2.0
ResNet-152
Microsoft
- - - MIT
ResNet-50
Microsoft
- - - MIT
ResNet-50 (A3 training)
Timm
- - - Apache 2.0
Qwen2.5-VL 72B
Alibaba
- - - Apache 2.0
CHURRO (3B)
Stanford
- - - Apache 2.0 / MIT
InternVL2-76B
Shanghai AI Lab
- - - MIT
InternVL3-78B
Shanghai AI Lab
- - - Apache 2.0 / MIT
Tesseract
Google (Open Source)
- - - Apache 2.0
EasyOCR
JaidedAI
- - - Apache 2.0
Gemini 2.5 Flash
Google
- - - Apache 2.0 / MIT
olmOCR v0.3.0
Allen AI
- - - Apache 2.0 / MIT
Qwen2-VL 72B
Alibaba
- - - Apache 2.0 / MIT
Qwen2.5-VL 32B
Alibaba
- - - Apache 2.0 / MIT
AIN 7B
Research
- - - Apache 2.0 / MIT
GPT-4o Mini
OpenAI
- - - Apache 2.0 / MIT
Azure OCR
Microsoft
- - - Apache 2.0 / MIT
PaddleOCR
Baidu
- - - Apache 2.0 / MIT
InternVL3 14B
OpenGVLab
- - - Apache 2.0 / MIT
o1-preview
OpenAI
- - - Apache 2.0 / MIT
Llama 3 70B
Meta
- - - Apache 2.0 / MIT
DeepSeek V3
DeepSeek
- - - Apache 2.0 / MIT
DeepSeek V2.5
DeepSeek
- - - Apache 2.0 / MIT
Claude 3.5 Opus
Anthropic
- - - Apache 2.0 / MIT
AL-Negat
Research
- - - Apache 2.0 / MIT
GCN
Research
- - - Apache 2.0 / MIT
Multi-Task Transformer
Research
- - - Apache 2.0 / MIT
Deep Learning (Heinsfeld)
Research
- - - Apache 2.0 / MIT
PHGCL-DDGFormer
Research
- - - Apache 2.0 / MIT
Random Forest
Baseline
- - - Apache 2.0 / MIT
MAACNN
Research
- - - Apache 2.0 / MIT
Multi-Atlas DNN
Research
- - - Apache 2.0 / MIT
Abraham Connectomes
Research
- - - Apache 2.0 / MIT
Go-Explore
Uber AI
- - - Apache 2.0 / MIT
BrainGNN
Research
- - - MIT
MVS-GCN
Research
- - - Apache 2.0 / MIT
BrainGT
Research
- - - Apache 2.0 / MIT
SVM with Connectivity Features
Research
- - - Apache 2.0 / MIT
AE-FCN
Research
- - - Apache 2.0 / MIT
DeepASD
Research
- - - Apache 2.0 / MIT
MCBERT
Research
- - - Apache 2.0 / MIT
ASD-SWNet
Research
- - - Apache 2.0 / MIT
Agent57
DeepMind
- - - Apache 2.0 / MIT
MuZero
DeepMind
- - - Apache 2.0 / MIT
DreamerV3
DeepMind
- - - Apache 2.0 / MIT
Rainbow DQN
DeepMind
- - - Apache 2.0 / MIT
DQN (Human-level)
DeepMind
- - - Apache 2.0 / MIT
Human Professional
Biology
- - - Apache 2.0 / MIT
BBOS-1
Unknown
- - - Apache 2.0 / MIT
GDI-H3
Research
- - - Apache 2.0 / MIT
Plymouth DL Model
Research
- - - Apache 2.0 / MIT
Co-DETR (Swin-L)
Research
- - - Apache 2.0 / MIT
InternImage-H
Shanghai AI Lab
- - - Apache 2.0 / MIT
DINO (Swin-L)
Research
- - - Apache 2.0 / MIT
YOLOv10-X
Tsinghua
- - - Apache 2.0 / MIT
Mask2Former (Swin-L)
Meta
- - - Apache 2.0 / MIT
EfficientDet-D7x
Google
- - - Apache 2.0 / MIT
CheXNet
Stanford ML Group
- - - MIT
TorchXRayVision
Cohen Lab
- - - Apache 2.0
CheXzero
Harvard/MIT
- - - MIT
MedCLIP
Research
- - - MIT
GLoRIA
Stanford
- - - MIT
BioViL
Microsoft
- - - MIT
RAD-DINO
Microsoft
- - - MIT
CheXpert AUC Maximizer
Stanford
- - - Apache 2.0 / MIT
DenseNet-121 (Chest X-ray)
Research
- - - MIT
ResNet-50 (Chest X-ray)
Research
- - - MIT
ConVIRT
NYU
- - - Apache 2.0 / MIT
PatchCore
Amazon
- - - Apache 2.0
PaDiM
Research
- - - Apache 2.0
FastFlow
Research
- - - MIT
EfficientAD
Research
- - - MIT
SimpleNet
Research
- - - MIT
DRAEM
Research
- - - MIT
CFLOW-AD
Research
- - - Apache 2.0
Reverse Distillation
Research
- - - MIT
YOLOv8 (Weld Detection)
Ultralytics
- - - AGPL-3.0
DefectDet (ResNet)
Research
- - - Apache 2.0 / MIT
When to use open source:
  • - Sensitive data that can't leave your network
  • - High volume processing (no per-page costs)
  • - Offline/air-gapped environments
  • - Full control over the pipeline

Vendor API Benchmark

Pay per page. Fast to integrate. Enterprise support available.

Vendor OmniDocBench OCRBench (EN) olmOCR Price/1k pages
Gemini 2.5 Pro
Google
88.03 59.3% - varies
Mistral OCR 3
Mistral
79.75 - 78.0 varies
Mistral OCR 2
Mistral
- - 72.0 varies
Seed1.6-vision
ByteDance
- 62.2% - varies
GPT-4o
OpenAI
- 55.5% - varies
clearOCR
TeamQuest
31.70 - - varies
Gemini 2.0 Flash
Google
- - - varies
Gemini 1.5 Pro
Google
- - - varies
Claude Sonnet 4
Anthropic
- - - varies
Claude 3.5 Sonnet
Anthropic
- - - varies
When to use vendor APIs:
  • - Need reasoning/context understanding (GPT-4o, Gemini)
  • - Low volume, occasional use
  • - Need enterprise SLA/support
  • - No infrastructure to maintain

CodeSOTA Score: Cross-Benchmark Comparison

One number to compare models across all benchmarks. Weighted average: primary benchmarks (3x), secondary (2x), language-specific (1x).

How we calculate this

Model Score Cover
OmniDoc
***
OCRBench
***
olmOCR
***
CHURRO
**
CC-OCR
**
KITAB
*
ThaiOCR
*
VideoOCR
*
1 paddleocr-vl
86.4 2/8
93
--
80
--
--
--
--
--
2 dots-ocr-3b
83.8 2/8
88
--
79
--
--
--
--
--
3 mineru-2.5
82.9 2/8
91
--
75
--
--
--
--
--
4 mistral-ocr-3
78.9 2/8
80
--
78
--
--
--
--
--
5 gemini-15-pro
77.1 2/8
--
--
--
--
83
--
--
65
6 gemini-25-pro
72.0 5/8
88
59
--
64
--
--
77
74
7 qwen25-vl-32b
68.8 2/8
--
--
--
--
--
--
77
61
8 qwen25-vl-72b
62.5 3/8
--
--
--
55
--
--
72
69
9 gpt-4o
58.1 5/8
--
56
--
34
76
69
--
66
10 claude-sonnet-4
52.7 2/8
--
--
--
37
--
--
84
--
11 paddleocr-vl-0.9b
-- 1/8
93
--
--
--
--
--
--
--
12 qwen3-vl-235b
-- 1/8
89
--
--
--
--
--
--
--
13 monkeyocr-pro-3b
-- 1/8
89
--
--
--
--
--
--
--
14 qwen25-vl
-- 1/8
87
--
--
--
--
--
--
--
15 ocrverse-4b
-- 1/8
89
--
--
--
--
--
--
--
16 clearocr-teamquest
-- 1/8
32
--
--
--
--
--
--
--
17 seed-1.6-vision
-- 1/8
--
62
--
--
--
--
--
--
18 qwen3-omni-30b
-- 1/8
--
61
--
--
--
--
--
--
19 nemotron-nano-v2-vl
-- 1/8
--
61
--
--
--
--
--
--
20 chandra-ocr-0.1.0
-- 1/8
--
--
83
--
--
--
--
--
21 deepseek-ocr
-- 1/8
--
--
75
--
--
--
--
--
22 marker-1.10.0
-- 1/8
--
--
77
--
--
--
--
--
23 gpt-4o-anchored
-- 1/8
--
--
70
--
--
--
--
--
24 gemini-flash-2
-- 1/8
--
--
64
--
--
--
--
--
25 infinity-parser-7b
-- 1/8
--
--
83
--
--
--
--
--
Open Source
Vendor API
| Scores:
90+
80-89
70-79
60-69
*** Primary | ** Secondary | * Tertiary
Showing top 25 of 44 models with OCR benchmark data 10 models have aggregate scores (2+ benchmarks)

TODO: Priority benchmarks to run

Open source models prioritized (can run locally without API costs):

1. qwen25-vl-32b on omnidocbench (Primary benchmark missing)
2. qwen25-vl-32b on ocrbench-v2 (Primary benchmark missing)
3. qwen25-vl-32b on olmocr-bench (Primary benchmark missing)
4. qwen25-vl-72b on omnidocbench (Primary benchmark missing)
5. qwen25-vl-72b on ocrbench-v2 (Primary benchmark missing)
6. qwen25-vl-72b on olmocr-bench (Primary benchmark missing)
7. internvl3-78b on omnidocbench (Primary benchmark missing)
8. internvl3-78b on ocrbench-v2 (Primary benchmark missing)
9. internvl3-78b on olmocr-bench (Primary benchmark missing)
10. ain-7b on omnidocbench (Primary benchmark missing)

* = open source (run locally) | * = vendor API

OmniDocBench: End-to-end document parsing composite score. OCRBench v2: Overall score across 8 OCR capabilities.

Data from AlphaXiv + Papers With Code.

Models

CoCa (finetuned)

OSS

Google

SOTA on ImageNet-1K (91.0%). Combines contrastive and captioning objectives.

Image classificationZero-shot recognition

ViT-G/14

OSS

Google

90.45% top-1 on ImageNet. Giant variant.

High-accuracy classification

ViT-H/14

OSS

Google

88.55% top-1 on ImageNet. Huge variant.

High-accuracy classification

ViT-L/16

OSS

Google

Large variant. 82.7% with ImageNet-21k pretraining.

Transfer learningFine-tuning

ViT-B/16

OSS

Google

Base variant. 81.2% with ImageNet-21k pretraining.

Balanced performanceResearch

ConvNeXt V2 Huge

OSS

Meta

88.9% on ImageNet. Best pure ConvNet.

SOTA CNN classification

ConvNeXt V2 Base

OSS

Meta

Good balance of speed and accuracy.

Efficient high-accuracy

ConvNeXt V2 Tiny

OSS

Meta

83.0% on ImageNet. Lightweight variant.

Efficient deployment

Swin Transformer V2 Large

OSS

Microsoft

86.8% on Kinetics-400. Scales to 3B parameters.

High-resolution imagesDense prediction

Swin Transformer Large

OSS

Microsoft

87.3% on ImageNet-1K.

Dense prediction tasks

EfficientNetV2-L

OSS

Google

85.7% on ImageNet. Faster training than V1.

Fast trainingHigh accuracy

EfficientNet-B7

OSS

Google

84.4% on ImageNet. 8.4x smaller than GPipe.

Best accuracy/params ratio

EfficientNet-B0

OSS

Google

77.1% on ImageNet. Baseline for compound scaling.

Mobile deploymentEdge devices

DeiT-B Distilled

OSS

Meta

85.2% on ImageNet. Trained on ImageNet-1K only.

Data-efficient training

DeiT-B

OSS

Meta

83.1% on ImageNet without external data.

Training from scratch

ResNet-152

OSS

Microsoft

78.6% on ImageNet (10-crop). Deep residual network.

Classic baselineTransfer learning

ResNet-50

OSS

Microsoft

76-80% on ImageNet depending on training. Standard baseline.

Baseline modelBenchmarking

ResNet-50 (A3 training)

OSS

Timm

80.4% on ImageNet with modern training recipes.

Modern CNN baseline

PaddleOCR-VL

OSS

Baidu

#1 on OmniDocBench

Document parsingTablesFormulas

PaddleOCR-VL 0.9B

OSS

Baidu

Lightweight version

Document parsingTables

MinerU 2.5

OSS

OpenDataLab

#1 on layout detection (97.5 mAP)

PDF extractionLayout detection

Qwen3-VL-235B

OSS

Alibaba

Large model, requires significant compute

Document understandingReasoning

MonkeyOCR-pro-3B

OSS

Unknown

Compact model with good performance

Document OCR

Gemini 2.5 Pro

API

Google

#1 on OCRBench v2 Chinese, MME-VideoOCR

High accuracyDocument Q&AChinese OCR

Gemini 2.0 Flash

API

Google

#1 on KITAB-Bench (Arabic)

Arabic OCRFast inference

Gemini 1.5 Pro

API

Google

#1 on CC-OCR Multi-Scene

Scene textMultilingualDocument parsing

Qwen2.5-VL

OSS

Alibaba

Document understandingMultilingual

Qwen2.5-VL 72B

OSS

Alibaba

Document understandingVideo OCRThai OCR

GPT-4o

API

OpenAI

Best OCR edit distance on OmniDocBench (0.02)

Text extractionDocument Q&A

Seed1.6-vision

API

ByteDance

#1 on OCRBench v2 English

OCR capabilities

Qwen3-Omni-30B

OSS

Alibaba

OCRMultimodal tasks

Nemotron Nano V2 VL

OSS

NVIDIA

Efficient OCREdge deployment

Chandra v0.1.0

OSS

datalab-to

#1 on olmOCR-Bench (83.1). Best on old scans math, long tiny text, base accuracy.

Document parsingOld scansMath formulas

OCRVerse 4B

OSS

Unknown

Strong OmniDocBench performer (88.56)

Document parsingText extraction

dots.ocr 3B

OSS

Unknown

Best table TEDS among 3B models. Also #1 on olmOCR tables (88.3)

Document parsingTables

Infinity-Parser 7B

OSS

Unknown

PDF parsing

olmOCR v0.4.0

OSS

Allen AI

PDF extractionResearch documents

CHURRO (3B)

OSS

Stanford

#1 on CHURRO-DS (82.3 printed, 70.1 handwritten)

Historical documentsHandwritingMultilingual

Claude Sonnet 4

API

Anthropic

#1 on ThaiOCRBench

Thai OCRLow hallucination

Claude 3.5 Sonnet

API

Anthropic

Lowest hallucination rate on CC-OCR (0.09%)

Document understanding

InternVL2-76B

OSS

Shanghai AI Lab

Scene textDocument parsing

InternVL3-78B

OSS

Shanghai AI Lab

Video OCRDocument understanding

Tesseract

OSS

Google (Open Source)

Classic open-source OCR engine

Basic OCROffline use

EasyOCR

OSS

JaidedAI

80+ languages supported

MultilingualEasy setup

DeepSeek OCR

OSS

DeepSeek

DeepSeek's OCR model for document understanding.

Document OCRGeneral OCR

Marker 1.10.0

OSS

VikParuchuri

Open-source PDF to Markdown converter.

PDF to MarkdownDocument parsing

Marker 1.10.1

OSS

VikParuchuri

Latest version of Marker PDF parser.

PDF to MarkdownDocument parsing

GPT-4o (Anchored)

OSS

OpenAI

GPT-4o with anchored prompting for OCR.

Document understandingOCR

Gemini Flash 2

OSS

Google

Google's fast multimodal model.

Fast inferenceOCR

Gemini 2.5 Flash

OSS

Google

Google's Gemini 2.5 Flash model.

Fast inferenceOCR

olmOCR v0.3.0

OSS

Allen AI

Earlier version of olmOCR.

Document OCRResearch

Mistral OCR 3

API

Mistral

Latest Mistral OCR (Dec 2025). 74% win rate vs OCR 2. Claims 94.9% accuracy. Markdown + HTML table output. $1/1000 pages with batch API.

Document OCRFormsTables

clearOCR

API

TeamQuest

Polish OCR service. Text extraction only - no table/formula recognition. Best for simple documents. VERIFIED by CodeSOTA: 84.6% text accuracy, but 0.8% table TEDS due to lack of structure recognition.

Simple text extractionPolish documentsResearch papers

Mistral OCR 2

API

Mistral

Previous version of Mistral OCR API.

Document OCRFast inference

Nanonets OCR2 3B

OSS

Nanonets

Nanonets' OCR model.

Document OCR

Qwen2-VL 72B

OSS

Alibaba

Qwen2's large vision-language model.

Vision understandingOCR

Qwen2.5-VL 32B

OSS

Alibaba

Qwen2.5 32B vision-language model.

Vision understandingOCR

AIN 7B

OSS

Research

7B parameter OCR model.

OCRDocument understanding

GPT-4o Mini

OSS

OpenAI

Smaller, faster version of GPT-4o.

Cost-effective OCRFast inference

Azure OCR

OSS

Microsoft

Microsoft Azure's OCR service.

Enterprise OCRMulti-language

PaddleOCR

OSS

Baidu

Open-source OCR from PaddlePaddle.

Multilingual OCRChinese text

InternVL3 14B

OSS

OpenGVLab

InternVL3 14B vision-language model.

Vision understandingOCR

o1-preview

OSS

OpenAI

OpenAI's reasoning-focused model.

Complex reasoningMath

Llama 3 70B

OSS

Meta

Meta's Llama 3 70B model.

General NLPReasoning

DeepSeek V3

OSS

DeepSeek

DeepSeek's V3 model.

General NLPCode

DeepSeek V2.5

OSS

DeepSeek

DeepSeek's V2.5 model.

General NLPCode

Claude 3.5 Opus

OSS

Anthropic

Anthropic's Claude 3.5 Opus model.

Complex reasoningAnalysis

AL-Negat

OSS

Research

Adversarial learning for brain network analysis.

Autism classificationBrain analysis

GCN

OSS

Research

Standard Graph Convolutional Network baseline.

Graph classificationBrain networks

Multi-Task Transformer

OSS

Research

Transformer-based multi-task learning for brain analysis.

Autism classificationMulti-task learning

Deep Learning (Heinsfeld)

OSS

Research

Heinsfeld et al. deep learning approach for ABIDE.

Autism classification

PHGCL-DDGFormer

OSS

Research

Graph transformer with dynamic graph learning.

Autism classificationBrain networks

Random Forest

OSS

Baseline

Standard Random Forest baseline.

Baseline classification

MAACNN

OSS

Research

Multi-scale attention CNN for brain imaging.

Autism classification

Multi-Atlas DNN

OSS

Research

DNN combining multiple brain atlases.

Autism classificationMulti-atlas fusion

Abraham Connectomes

OSS

Research

Abraham et al. connectome-based approach.

Brain connectivity analysis

Go-Explore

OSS

Uber AI

Exploration-based reinforcement learning.

Hard exploration gamesAtari

BrainGNN

OSS

Research

ROI-aware graph convolutional layers for interpretable brain network analysis. 73.3% accuracy on ABIDE I.

fMRI analysisBrain connectivityAutism classification

MVS-GCN

OSS

Research

Handles multi-site variability. 69.38% accuracy on ABIDE dataset.

Multi-site brain dataAutism classification

BrainGT

OSS

Research

78.7% AUC on ABIDE dataset, significantly higher than BrainNetTF (73.2%).

Brain disorder diagnosisGraph attention

SVM with Connectivity Features

OSS

Research

70.1% accuracy on ABIDE with functional connectivity features. Classic baseline for brain classification.

Baseline comparisonTraditional ML

AE-FCN

OSS

Research

85% accuracy combining fMRI and sMRI on ABIDE (Rakic et al., 2020).

Feature learningMulti-modal brain data

DeepASD

OSS

Research

93% AUC-ROC on ABIDE-II combining fMRI and SNPs data.

Multi-modal fusionAutism diagnosis

MCBERT

OSS

Research

93.4% accuracy on ABIDE-I with leave-one-site-out cross-validation. Uses phenotypic data.

Medical imagingMulti-modal learning

ASD-SWNet

OSS

Research

76.52% accuracy, 80.65% recall, 0.81 AUC on ABIDE dataset.

Autism diagnosisfMRI classification

Agent57

OSS

DeepMind

First agent to surpass human performance on all 57 Atari games. Uses a meta-controller to adapt exploration.

Hard exploration gamesGeneral arcade gaming

MuZero

OSS

DeepMind

Learns a model of the environment's dynamics without knowing the rules. Mastered Go, Chess, Shogi, and Atari.

Board games (Go, Chess)Atari

DreamerV3

OSS

DeepMind

Scalable world model that masters Atari and Minecraft (MineDojo) with fixed hyperparameters.

Sample efficiencyVisual controlAtari

Rainbow DQN

OSS

DeepMind

Combines 7 improvements to DQN (Double, Dueling, PER, Noisy Nets, Distributional, n-step).

Baseline RLDiscrete control

DQN (Human-level)

OSS

DeepMind

The breakthrough paper (Nature 2015) that started the Deep RL revolution.

Historical baseline

Human Professional

OSS

Biology

Average score of a professional human games tester. Normalized to 100%.

GeneralizationFew-shot learning

BBOS-1

OSS

Unknown

Achieved massive scores on specific games.

Atari

GDI-H3

OSS

Research

Sample efficient benchmark winner.

Atari 100k

Plymouth DL Model

OSS

Research

Up to 98% accuracy on a subset of ABIDE (884 participants). Highlights visual processing regions.

Explainable AIAutism diagnosis

Co-DETR (Swin-L)

OSS

Research

Collaborative Hybrid Assignments Training. SOTA on COCO.

Object Detection

InternImage-H

OSS

Shanghai AI Lab

Large-scale vision model bridging CNN and Transformer.

DetectionSegmentation

DINO (Swin-L)

OSS

Research

End-to-end object detection with transformers.

Object Detection

YOLOv10-X

OSS

Tsinghua

NMS-free training for low latency.

Real-time Detection

Mask2Former (Swin-L)

OSS

Meta

Universal image segmentation architecture.

Segmentation

EfficientDet-D7x

OSS

Google

Classic efficient detector.

Efficient Detection

CheXNet

OSS

Stanford ML Group

First model to exceed radiologist performance on pneumonia detection. Trained on ChestX-ray14.

Chest X-ray classificationPneumonia detection

TorchXRayVision

OSS

Cohen Lab

Pre-trained on 8 datasets (MIMIC, CheXpert, NIH, etc.). Unified 18-pathology output.

Multi-dataset chest X-rayTransfer learning

CheXzero

OSS

Harvard/MIT

Zero-shot chest X-ray classification using CLIP. No task-specific training needed.

Zero-shot chest X-rayReport generation

MedCLIP

OSS

Research

Decoupled contrastive learning on MIMIC-CXR. Semantic matching for medical imaging.

Medical image-text matchingZero-shot diagnosis

GLoRIA

OSS

Stanford

Global-Local Representations for Images using Attention. Learns fine-grained image-text alignment.

Chest X-ray classificationReport generation

BioViL

OSS

Microsoft

Biomedical Vision-Language model. Strong performance on phrase grounding.

Medical VQAReport generation

RAD-DINO

OSS

Microsoft

Self-supervised radiology foundation model. Strong transfer to downstream tasks.

Radiology foundation modelTransfer learning

CheXpert AUC Maximizer

OSS

Stanford

Competition-winning ensemble. 93.0% mean AUC on 5 competition tasks.

CheXpert competitionMulti-label classification

DenseNet-121 (Chest X-ray)

OSS

Research

Standard baseline for chest X-ray classification. Pre-trained on ImageNet.

Baseline chest X-rayTransfer learning

ResNet-50 (Chest X-ray)

OSS

Research

Standard ResNet baseline for radiology.

Baseline chest X-ray

ConVIRT

OSS

NYU

Contrastive VIsual Representation learning from Text. Pioneered medical CLIP-like training.

Medical pre-trainingZero-shot transfer

PatchCore

OSS

Amazon

State-of-the-art on MVTec AD. Uses pretrained features with coreset subsampling.

Industrial inspectionFew-shot anomaly detection

PaDiM

OSS

Research

Patch-wise anomaly detection using pretrained embeddings and Mahalanobis distance.

Anomaly localizationTexture defects

FastFlow

OSS

Research

2D normalizing flows for fast anomaly detection. Good speed-accuracy tradeoff.

Fast inferenceReal-time inspection

EfficientAD

OSS

Research

614 FPS inference speed. Optimized for production deployment.

Edge deploymentReal-time industrial

SimpleNet

OSS

Research

Simple yet effective. Competitive with complex methods on MVTec.

Simple deploymentGood generalization

DRAEM

OSS

Research

Discriminatively trained reconstruction for anomaly detection.

Anomaly synthesisPixel-level detection

CFLOW-AD

OSS

Research

Real-time unsupervised anomaly detection via conditional normalizing flows.

Precise localizationMulti-scale detection

Reverse Distillation

OSS

Research

Reverse distillation for anomaly detection. Strong on texture classes.

Knowledge distillationAnomaly detection

YOLOv8 (Weld Detection)

OSS

Ultralytics

Fine-tuned YOLOv8 for weld defect detection. Fast inference for production.

Weld defect detectionReal-time inspection

DefectDet (ResNet)

OSS

Research

ResNet backbone with FPN for multi-scale defect detection.

Steel defect detectionSurface inspection

Have benchmark results?

Submit your paper or benchmark results. We verify and add them to our database.

Submit Paper

Get OCR updates

New models, benchmark results, and practical guides.

About This Data

All benchmark results are sourced from AlphaXiv benchmark leaderboards. Each data point includes the source URL and access date for verification.

Results marked as "pending verification" are claimed in papers but have not been independently confirmed. We do not include estimated or interpolated values.