Home / OCR / Benchmarks / olmOCR-Bench

olmOCR-Bench

Allen Institute for AI

PDF content extraction benchmark with 7,010 unit tests across 1,402 PDF documents.

28
Total Results
16
Models Tested
9
Metrics
2025-12-19
Last Updated

Pass Rate

Percentage of unit tests passed

Higher is better

Rank Model Score Source
1 chandra-ocr-0.1.0

7,010 unit tests across 1,402 PDF documents. #1 overall on olmOCR-Bench.

83.1 % alphaxiv-leaderboard
2 infinity-parser-7b 82.5 % alphaxiv-leaderboard
3 olmocr-v0.4.0 82.4 % alphaxiv-leaderboard
4 paddleocr-vl 80 % alphaxiv-leaderboard
5 dots-ocr-3b 79.1 % github-readme
6 mistral-ocr-3

Estimated based on 74% win rate vs OCR 2

78 % mistral-announcement
7 marker-1.10.0 76.5 % github-readme
8 marker-1.10.1 76.1 % alphaxiv-leaderboard
9 deepseek-ocr 75.7 % alphaxiv-leaderboard
10 deepseek-ocr

Chandra outperforms by 7.7 points

75.4 % github-readme
11 mineru-2.5 75.2 % alphaxiv-leaderboard
12 mistral-ocr-api 72 % alphaxiv-leaderboard
13 gpt-4o-anchored

GPT-4o with anchored prompting

69.9 % github-readme
14 nanonets-ocr2-3b 69.5 % alphaxiv-leaderboard
15 gemini-flash-2 63.8 % github-readme

tables

Higher is better

Rank Model Score Source
1 dots-ocr-3b

#1 on table recognition

88.3 github-readme
2 chandra-ocr-0.1.0

Table recognition category. Near-best (dots.ocr: 88.3)

88 github-readme

old-scans-math

Higher is better

Rank Model Score Source
1 chandra-ocr-0.1.0

Mathematical notation in old scans. #1, leads by 5.4 points

80.3 github-readme
2 olmocr-v0.3.0

#2 on math in old scans

79.9 github-readme

long-tiny-text

Higher is better

Rank Model Score Source
1 chandra-ocr-0.1.0

Long documents with tiny text. #1 in category

92.3 github-readme

base

Higher is better

Rank Model Score Source
1 chandra-ocr-0.1.0

Base clean document parsing. Near-perfect

99.9 github-readme

headers-footers

Higher is better

Rank Model Score Source
1 olmocr-v0.3.0

#1 on headers/footers extraction

95.1 github-readme
2 chandra-ocr-0.1.0

Header/footer extraction

90.8 github-readme

multi-column

Higher is better

Rank Model Score Source
1 chandra-ocr-0.1.0

Multi-column document parsing

81.2 github-readme

arxiv

Higher is better

Rank Model Score Source
1 marker-1.10.0

#1 on ArXiv paper parsing

83.8 github-readme
2 chandra-ocr-0.1.0

ArXiv paper parsing. Marker leads (83.8)

82.2 github-readme

old-scans

Higher is better

Rank Model Score Source
1 chandra-ocr-0.1.0

Old scan recognition. #1 (GPT-4o: 40.7)

50.4 github-readme
2 gpt-4o

#2 on old scans. Chandra leads by 9.7 points

40.7 github-readme