Multi-step Reasoning2024en
Graduate-Level Google-Proof Q&A
448 expert-level questions in biology, physics, and chemistry. Designed to be unsearchable.
Metrics:accuracy
Paper / WebsiteCurrent State of the Art
Gemini 3 Pro
91.9
accuracy
GPQA — accuracy
33 results · 2 SOTA advances · higher is better
All results
SOTA frontier
accuracy Progress Over Time
Showing 5 breakthroughs from Dec 2024 to Apr 2026
Key Milestones
Dec 2024
Qwen2.5-72B-Instruct
Qwen2.5-72B-Instruct. GPQA Diamond. Table 6 in Qwen2.5 Technical Report.
49.0
Jan 2025
DeepSeek-R1
GPQA Diamond, 0-shot CoT. Source: DeepSeek-R1 paper Table 3, arxiv:2501.12948 (Jan 2025).
71.5
+45.9%
Mar 2026
Gemini 2.5 Pro
GPQA Diamond, 0-shot CoT. Source: Gemini 2.5 Pro technical report, Google DeepMind (April 2025).
84.0
+1.4%
Total Improvement
87.6%
Time Span
1y 5m
Breakthroughs
5
Current SOTA
91.9
Top Models Performance Comparison
Top 10 models ranked by accuracy
Best Score
91.9
Top Model
Gemini 3 Pro
Models Compared
10
Score Range
14.3
accuracyPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Gemini 3 Pro Google | 91.9 | Apr 2026 | |
| 2 | Claude Opus 4.6API Anthropic | 91.3 | Apr 2026 | |
| 3 | Gemini 3 FlashAPI Google | 90.4 | Apr 2026 | |
| 4 | Claude Sonnet 4.6API Anthropic | 89.9 | Apr 2026 | |
| 5 | GPT-5API OpenAI | 89 | Apr 2026 | |
| 6 | Grok 4API xAI | 88 | Apr 2026 | |
| 7 | Gemini 2.5 ProAPI Google | 84 | Mar 2026 | |
| 8 | o3API OpenAI | 82.8 | Mar 2026 | |
| 9 | Gemini 2.5 Flash Google | 82.8 | Apr 2026 | |
| 10 | o4-miniAPI OpenAI | 77.6 | Mar 2026 | |
| 11 | Claude Opus 4API Anthropic | 76.7 | Mar 2026 | |
| 12 | o1API OpenAI | 75.7 | Mar 2026 | |
| 13 | Claude Opus 4.5API Anthropic | 74.9 | Mar 2026 | |
| 14 | o3-miniAPI OpenAI | 74.9 | Mar 2026 | |
| 15 | o1-preview OpenAI | 73.3 | Mar 2026 | |
| 16 | DeepSeek-R1Open Source DeepSeek | 71.5 | Mar 2026 | |
| 17 | Qwen3-235B-A22B Alibaba | 71.1 | Apr 2026 | |
| 18 | Claude Sonnet 4API Anthropic | 70 | Mar 2026 | |
| 19 | Llama-4-MaverickOpen Source Meta | 69.8 | Mar 2026 | |
| 20 | GPT-4.5 PreviewAPI OpenAI | 69.5 | Mar 2026 | |
| 21 | GPT-4.1 miniAPI OpenAI | 66.4 | Apr 2026 | |
| 22 | GPT-4.1API OpenAI | 66.3 | Mar 2026 | |
| 23 | o1-miniAPI OpenAI | 60 | Mar 2026 | |
| 24 | Claude 3.5 SonnetAPI Anthropic | 59.4 | Mar 2026 | |
| 25 | Grok 2API xAI | 56 | Mar 2026 | |
| 26 | Llama 3.1 405BOpen Source Meta | 50.7 | Mar 2026 | |
| 27 | Claude 3 OpusAPI Anthropic | 50.4 | Mar 2026 | |
| 28 | GPT-4oAPI OpenAI | 49.9 | Mar 2026 | |
| 29 | GPT-4 TurboAPI OpenAI | 49.3 | Mar 2026 | |
| 30 | Qwen2.5-72B-InstructOpen Source Alibaba | 49 | Mar 2026 | |
| 31 | Gemini 1.5 ProAPI Google | 46.2 | Mar 2026 | |
| 32 | Llama 3.1 70BOpen Source Meta | 41.7 | Mar 2026 | |
| 33 | GPT-4o mini OpenAI | 40.2 | Mar 2026 |