Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.
Overall Zh Private is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Qwen2.5-VL-72B | paper | 63.7 | 2025 | Source ↗ | Looks wrong? |
| 02 | gemini-25-pro | paper | 62.2 | 2025 | Source ↗ | Looks wrong? |
| 03 | Gemini 2.5 Pro | unverified | 62.2 | 2025 | Source ↗ | Looks wrong? |
| 04 | Qianfan-OCR | paper | 60.77 | 2025 | Source ↗ | Looks wrong? |
| 05 | minicpm-v-4.5-8b | unverified | 58.8 | 2025 | Source ↗ | Looks wrong? |
| 06 | sail-vl2-8b | paper | 57.6 | 2025 | Source ↗ | Looks wrong? |
| 07 | claude-3.5-sonnet | unverified | 48.4 | 2024 | Source ↗ | Looks wrong? |
| 08 | InternVL2.5-78B | paper | 46.2 | 2025 | Source ↗ | Looks wrong? |
| 09 | Qwen2-VL-72B | paper | 46.1 | 2024 | Source ↗ | Looks wrong? |
| 10 | gpt-4o-2024 | unverified | 45.7 | 2024 | Source ↗ | Looks wrong? |
English Score is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Ovis2.5-9B | unverified | 63.4 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 02 | Intern-S1-Pro | unverified | 60.1 | 2026 | Paper ↗Source ↗ | Looks wrong? |
Overall En Private is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
Chinese Score is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Intern-S1-Pro | unverified | 60.6 | 2026 | Paper ↗Source ↗ | Looks wrong? |
| 02 | Ovis2.5-9B | unverified | 58 | 2025 | Paper ↗Code ↗ | Looks wrong? |
Overall Zh Public is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | InternVL3-14B | paper | 55.7 | 2025 | Source ↗ | Looks wrong? |
| 02 | Qwen2.5-VL-7B | paper | 55.6 | 2025 | Source ↗ | Looks wrong? |
| 03 | Ovis2-8B | paper | 49.2 | 2025 | Source ↗ | Looks wrong? |
| 04 | Gemini 1.5 Pro | paper | 43.1 | 2024 | Source ↗ | Looks wrong? |
| 05 | DeepSeek-VL2-Small | paper | 42.7 | 2024 | Source ↗ | Looks wrong? |
| 06 | Step-1V | paper | 42.6 | 2024 | Source ↗ | Looks wrong? |
| 07 | MiniCPM-o-2.6 | paper | 41.1 | 2024 | Source ↗ | Looks wrong? |
| 08 | Claude 3.5 Sonnet | paper | 39.6 | 2024 | Source ↗ | Looks wrong? |
| 09 | GLM-4V-9B | paper | 36.6 | 2024 | Source ↗ | Looks wrong? |
| 10 | GPT-4o | paper | 32.2 | 2024 | Source ↗ | Looks wrong? |
Overall En Public is the reported evaluation metric for OCRBench v2. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | InternVL3-14B | paper | 52.6 | 2025 | Source ↗ | Looks wrong? |
| 02 | Gemini 1.5 Pro | paper | 51.9 | 2024 | Source ↗ | Looks wrong? |
| 03 | Ovis2-8B | paper | 47.7 | 2025 | Source ↗ | Looks wrong? |
| 04 | Step-1V | paper | 46.7 | 2024 | Source ↗ | Looks wrong? |
| 05 | Qwen2.5-VL-7B | paper | 46.7 | 2025 | Source ↗ | Looks wrong? |
| 06 | GPT-4o | paper | 46.5 | 2024 | Source ↗ | Looks wrong? |
| 07 | Claude 3.5 Sonnet | paper | 45.2 | 2024 | Source ↗ | Looks wrong? |
| 08 | MiniCPM-o-2.6 | paper | 45.1 | 2024 | Source ↗ | Looks wrong? |
| 09 | DeepSeek-VL2-Small | paper | 43.3 | 2024 | Source ↗ | Looks wrong? |
| 10 | GLM-4V-9B | paper | 42.6 | 2024 | Source ↗ | Looks wrong? |
| 11 | Pixtral-12B | paper | 40.3 | 2024 | Source ↗ | Looks wrong? |
| 12 | LLaVA-OneVision-7B | paper | 36.4 | 2024 | Source ↗ | Looks wrong? |
| 13 | Cambrian-1-8B | paper | 34.7 | 2024 | Source ↗ | Looks wrong? |
| 14 | Molmo-7B | paper | 34.5 | 2024 | Source ↗ | Looks wrong? |