GSM8K
Unknown
8,500 grade school math word problems requiring multi-step reasoning. The most popular math reasoning benchmark.
Benchmark Stats
Models5
Papers5
Metrics1
SOTA History
Coming SoonVisual timeline of state-of-the-art progression over time will appear here.
accuracy
accuracy
Higher is better
| Rank | Model | Code | Score | Paper / Source |
|---|---|---|---|---|
| 1 | o1-preview Grade school math word problems. o1-preview achieves near-human performance. | - | 97.8 | openai-blog |
| 2 | claude-35-sonnet | - | 96.4 | anthropic-blog |
| 3 | llama-3-70b | HF | 93 | meta-blog |
| 4 | gpt-4o | - | 92 | openai-blog |
| 5 | gemini-15-pro | - | 91.7 | google-blog |