Human preference rankings from head-to-head battles. Lower latency + higher ELO = better.
Highest accuracy but slowest. Best for batch processing.
Excellent quality at reasonable speed. Good default choice.
Best quality among OSS. Chandra fine-tunes this model.
Fastest model but lower accuracy. Good for high-volume.
| # | Model | Type | ELO | Win Rate | Latency | Battles |
|---|---|---|---|---|---|---|
| ★ | Gemini 3 Preview | API | 1688 | 72.2% | 39.2s | 1,609 |
| 2 | Opus 4.5 (Low) | API | 1647 | 67.7% | 18.5s | 959 |
| 3 | Gemini 2.5 Pro | API | 1645 | 72.1% | 46.6s | 1,588 |
| 4 | Opus 4.5 (Medium) | API | 1618 | 69.7% | 18.9s | 890 |
| 5 | GPT-5.2 (Medium) | API | 1595 | 67.2% | 35.9s | 137 |
| 6 | GPT-5.1 (Medium) | API | 1574 | 60.3% | 18.8s | 1,589 |
| 7 | Sonnet 4.5 | API | 1571 | 49% | 21s | 989 |
| 8 | Gemini 2.5 Flash | API | 1549 | 56.7% | 14.5s | 1,674 |
| 9 | GPT-5.2 (None) | API | 1538 | 62.2% | 15s | 148 |
| 10 | GPT-5.1 (Low) | API | 1527 | 55.9% | 8.7s | 1,683 |
| 11 | GPT-5 (Low) | API | 1467 | 44.4% | 15.8s | 1,587 |
| 12 | GPT-5 (Medium) | API | 1466 | 46.1% | 35s | 1,587 |
| 13 | Iris | OSS | 1465 | 36.8% | 9.8s | 163 |
| 14 | Qwen3-VL-8B | OSS | 1446 | 40.8% | 7.2s | 1,338 |
| 15 | dots.ocr | OSS | 1438 | 36.5% | 3.6s | 1,371 |
| 16 | Nanonets2-3B | OSS | 1376 | 34.1% | 4.9s | 943 |
| 17 | olmOCR 2 | OSS | 1324 | 29.1% | 12.7s | 1,639 |
| 18 | DeepSeek OCR | OSS | 1302 | 19.9% | 3.5s | 1,598 |
OCR Arena uses human preference rankings through head-to-head battles. Users compare OCR outputs from two anonymous models and select the better result. ELO scores are calculated from these battles, similar to chess rankings.
Note: Latency measurements are from the Arena API, not local inference. Self-hosted open source models can be significantly faster on dedicated hardware.