MATH
Unknown
12,500 competition mathematics problems from AMC, AIME, and other sources. Harder than GSM8K.
Benchmark Stats
Models5
Papers5
Metrics1
SOTA History
Coming SoonVisual timeline of state-of-the-art progression over time will appear here.
accuracy
accuracy
Higher is better
| Rank | Model | Code | Score | Paper / Source |
|---|---|---|---|---|
| 1 | o1-preview Competition mathematics. Massive improvement over GPT-4. | - | 94.8 | openai-blog |
| 2 | deepseek-v3 | - | 90.2 | deepseek-blog |
| 3 | gpt-4o | - | 76.6 | openai-blog |
| 4 | claude-35-sonnet | - | 71.1 | anthropic-blog |
| 5 | gemini-15-pro | - | 67.7 | google-blog |