OCRBench v2
South China University of Technology
Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.
Benchmark Stats
SOTA History
Overall (Chinese)
Average score on Chinese private test set
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | gemini-25-pro Chinese, Private split. #1 on Chinese | Editorial | 62.2 | 2025 | Source |
| 2 | Qianfan-OCR Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought. #1 on zh | Editorial | 60.77 | 2026 | Source |
| 3 | minicpm-v-4.5-8b Chinese, Private split. #4 overall | Editorial | 58.8 | 2025 | Source |
| 4 | sail-vl2-8b | Editorial | 57.6 | 2025 | Source |
| 5 | claude-3.5-sonnet | Editorial | 48.4 | 2025 | Source |
| 6 | gpt-4o-2024 | Editorial | 45.7 | 2025 | Source |
Overall (English)
Average score on English private test set
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | seed-1.6-vision English, Private split. #1 on OCRBench v2 | Editorial | 62.2 | 2025 | Source |
| 2 | qwen3-omni-30b | Editorial | 61.3 | 2025 | Source |
| 3 | nemotron-nano-v2-vl | Editorial | 61.2 | 2025 | Source |
| 4 | gemini-25-pro | Editorial | 59.3 | 2025 | Source |
| 5 | llama-3.1-nemotron-nano-vl-8b | Editorial | 56.4 | 2025 | Source |
| 6 | Qianfan-OCR Baidu Qianfan-OCR 4B (Qwen3-4B + Qianfan-ViT), Apache 2.0, 192 langs. Layout-as-Thought. | Editorial | 56 | 2026 | Source |
| 7 | gpt-4o Listed as GPT5-2025-08-07 on leaderboard | Editorial | 55.5 | 2025 | Source |
| 8 | ovis2.5-8b | Editorial | 54.1 | 2025 | Source |
| 9 | gemini-1.5-pro | Editorial | 51.6 | 2025 | Source |
| 10 | sail-vl2-8b | Editorial | 49.3 | 2025 | Source |
| 11 | minicpm-v-4.5-8b | Editorial | 48.4 | 2025 | Source |
| 12 | gpt-4o-2024 GPT-4o baseline (not GPT5-2025-08-07) | Editorial | 47.6 | 2025 | Source |
| 13 | claude-3.5-sonnet | Editorial | 47.5 | 2025 | Source |
| 14 | internvl3.5-14b | Editorial | 47.1 | 2025 | Source |
| 15 | step-1v | Editorial | 46.8 | 2025 | Source |
| 16 | grok4 | Editorial | 45 | 2025 | Source |
| 17 | gpt-4o-mini | Editorial | 44.1 | 2025 | Source |
| 18 | claude-sonnet-4 Claude-sonnet-4-20250514 | Editorial | 42.4 | 2025 | Source |
| 19 | qwen2.5-vl-7b | Editorial | 41.8 | 2025 | Source |
| 20 | deepseek-vl2-small | Editorial | 41 | 2025 | Source |
| 21 | pixtral-12b | Editorial | 38.4 | 2025 | Source |
| 22 | phi-4-multimodal | Editorial | 38.1 | 2025 | Source |
| 23 | glm-4v-9b | Editorial | 37.1 | 2025 | Source |
| 24 | molmo-7b | Editorial | 33.9 | 2025 | Source |
| 25 | llava-ov-7b | Editorial | 33.7 | 2025 | Source |
| 26 | idefics3-8b | Editorial | 26 | 2025 | Source |
| 27 | mistral-ocr-2512 Verified via CodeSOTA benchmark. 7,400 English samples. Mistral OCR is a pure OCR model (text extraction only) - not designed for VQA, chart parsing, or structured extraction tasks. Strong on full-page OCR (79.1%) and document parsing (55.2%). | Editorial | 25.2 | 2025 | Source |
| 28 | docowl2 | Editorial | 23.4 | 2025 | Source |