AIME I + II 2025. 30 problems total. Metric is average number of correct problems out of 30 (or % correct). Frontier models now achieve near-perfect scores.
Higher is better
| Rank | Model | Trust | Score | Year | Source |
|---|---|---|---|---|---|
| 01 | o4-mini | verified | 92.7 | 2026 | Source ↗ |
| 02 | o3 | verified | 86.7 | 2026 | Source ↗ |
| 03 | Gemini 2.5 Pro | verified | 86.7 | 2026 | Source ↗ |
| 04 | Claude Opus 4.5 | verified | 80 | 2026 | Source ↗ |
| 05 | DeepSeek R1 | verified | 72 | 2026 | Source ↗ |