GPQA
Unknown
448 expert-level questions in biology, physics, and chemistry. Designed to be unsearchable.
Benchmark Stats
Models17
Papers17
Metrics1
SOTA History
Not enough data to show trend.
accuracy
accuracy
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | o3 | Editorial | 82.8 | 2026 | Source |
| 2 | o4-mini | Editorial | 77.6 | 2026 | Source |
| 3 | o1 | Editorial | 75.7 | 2026 | Source |
| 4 | o3-mini | Editorial | 74.9 | 2026 | Source |
| 5 | o1-preview | Editorial | 73.3 | 2026 | Source |
| 6 | gpt-45-preview | Editorial | 69.5 | 2026 | Source |
| 7 | gpt-41 | Editorial | 66.3 | 2026 | Source |
| 8 | o1-mini | Editorial | 60 | 2026 | Source |
| 9 | claude-35-sonnet | Editorial | 59.4 | 2026 | Source |
| 10 | grok-2 | Editorial | 56 | 2026 | Source |
| 11 | llama-31-405b | Editorial | 50.7 | 2026 | Source |
| 12 | claude-3-opus | Editorial | 50.4 | 2026 | Source |
| 13 | gpt-4o | Editorial | 49.9 | 2026 | Source |
| 14 | gpt-4-turbo | Editorial | 49.3 | 2026 | Source |
| 15 | gemini-15-pro | Editorial | 46.2 | 2026 | Source |
| 16 | llama-31-70b | Editorial | 41.7 | 2026 | Source |
| 17 | gpt-4o-mini | Editorial | 40.2 | 2026 | Source |