Codesota · Benchmark · AIME 2025Home/Leaderboards/Reasoning/Mathematical Reasoning/AIME 2025
Unknown

AIME 2025.

AIME I + II 2025. 30 problems total. Metric is average number of correct problems out of 30 (or % correct). Frontier models now achieve near-perfect scores.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

Not enough data to show trend.
§ 02 · Leaderboard

Results by metric.

accuracy

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearSource
01o4-mini
Average over AIME 2025 I+II. Source: OpenAI o4-mini system card (April 2025).
verified92.72026Source ↗
02o3
Average over AIME 2025 I+II (30 problems). Source: OpenAI (2025).
verified86.72026Source ↗
03Gemini 2.5 Pro
Average over AIME 2025 I+II. Source: Gemini 2.5 Pro technical report (April 2025).
verified86.72026Source ↗
04Claude Opus 4.5
Average over AIME 2025 I+II. Source: Claude Opus 4.5 model card, Anthropic (2025).
verified802026Source ↗
05DeepSeek R1
Average AIME 2025 I+II (estimated from leaderboard). Source: DeepSeek-R1 technical report.
verified722026Source ↗
§ 04 · Submit a result

Add to the leaderboard.

← Back to Mathematical Reasoning