OCRBench v2
South China University of Technology
Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.
Benchmark Stats
Models27
Papers32
Metrics2
SOTA History
Coming SoonVisual timeline of state-of-the-art progression over time will appear here.
Overall (English)
Average score on English private test set
Higher is better
| Rank | Model | Code | Score | Paper / Source |
|---|---|---|---|---|
| 1 | seed-1.6-vision English, Private split. #1 on OCRBench v2 | - | 62.2 | AlphaXiv |
| 2 | qwen3-omni-30b | - | 61.3 | AlphaXiv |
| 3 | nemotron-nano-v2-vl | - | 61.2 | AlphaXiv |
| 4 | gemini-25-pro | - | 59.3 | AlphaXiv |
| 5 | llama-3.1-nemotron-nano-vl-8b | - | 56.4 | ocrbench-v2-leaderboard |
| 6 | gpt-4o Listed as GPT5-2025-08-07 on leaderboard | - | 55.5 | AlphaXiv |
| 7 | ovis2.5-8b | - | 54.1 | ocrbench-v2-leaderboard |
| 8 | gemini-1.5-pro | - | 51.6 | ocrbench-v2-leaderboard |
| 9 | sail-vl2-8b | - | 49.3 | ocrbench-v2-leaderboard |
| 10 | minicpm-v-4.5-8b | - | 48.4 | ocrbench-v2-leaderboard |
| 11 | gpt-4o-2024 GPT-4o baseline (not GPT5-2025-08-07) | - | 47.6 | ocrbench-v2-leaderboard |
| 12 | claude-3.5-sonnet | - | 47.5 | ocrbench-v2-leaderboard |
| 13 | internvl3.5-14b | - | 47.1 | ocrbench-v2-leaderboard |
| 14 | step-1v | - | 46.8 | ocrbench-v2-leaderboard |
| 15 | grok4 | - | 45 | ocrbench-v2-leaderboard |
| 16 | gpt-4o-mini | - | 44.1 | ocrbench-v2-leaderboard |
| 17 | claude-sonnet-4 Claude-sonnet-4-20250514 | - | 42.4 | ocrbench-v2-leaderboard |
| 18 | qwen2.5-vl-7b | - | 41.8 | ocrbench-v2-leaderboard |
| 19 | deepseek-vl2-small | - | 41 | ocrbench-v2-leaderboard |
| 20 | pixtral-12b | - | 38.4 | ocrbench-v2-leaderboard |
| 21 | phi-4-multimodal | - | 38.1 | ocrbench-v2-leaderboard |
| 22 | glm-4v-9b | - | 37.1 | ocrbench-v2-leaderboard |
| 23 | molmo-7b | - | 33.9 | ocrbench-v2-leaderboard |
| 24 | llava-ov-7b | - | 33.7 | ocrbench-v2-leaderboard |
| 25 | idefics3-8b | - | 26 | ocrbench-v2-leaderboard |
| 26 | mistral-ocr-2512 Verified via CodeSOTA benchmark. 7,400 English samples. Mistral OCR is a pure OCR model (text extraction only) - not designed for VQA, chart parsing, or structured extraction tasks. Strong on full-page OCR (79.1%) and document parsing (55.2%). | - | 25.2 | codesota-verified |
| 27 | docowl2 | - | 23.4 | ocrbench-v2-leaderboard |
Overall (Chinese)
Average score on Chinese private test set
Higher is better
| Rank | Model | Code | Score | Paper / Source |
|---|---|---|---|---|
| 1 | gemini-25-pro Chinese, Private split. #1 on Chinese | - | 62.2 | AlphaXiv |
| 2 | minicpm-v-4.5-8b Chinese, Private split. #4 overall | - | 58.8 | ocrbench-v2-leaderboard |
| 3 | sail-vl2-8b | - | 57.6 | ocrbench-v2-leaderboard |
| 4 | claude-3.5-sonnet | - | 48.4 | ocrbench-v2-leaderboard |
| 5 | gpt-4o-2024 | - | 45.7 | ocrbench-v2-leaderboard |