Codesota · Benchmark · MMLU-ProHome/Leaderboards/Reasoning/Commonsense Reasoning/MMLU-Pro
Unknown

MMLU-Pro.

Harder version of MMLU with 10-choice multiple-choice questions across 57 subjects and 12,000 questions. Reduces sensitivity to prompt format and increases reasoning difficulty.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

Not enough data to show trend.
§ 02 · Leaderboard

Results by metric.

accuracy

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearSource
01Gemini 3.1 Pro
Gemini 3.1 Pro Preview (02/26)
unverified90.992026Source ↗
02Gemini 3 Pro
High reasoning
unverified89.82026Source ↗
03Claude Opus 4.5
Thinking mode
unverified89.52026Source ↗
04Gemini 3 Flash
Thinking mode
unverified892026Source ↗
05Qwen3.6 Plusunverified88.52026Source ↗
06Claude Opus 4.1
Thinking mode
unverified882026Source ↗
07MiniMax M2.1unverified882026Source ↗
08Qwen3.5-397B-A17Bunverified87.82026Source ↗
09Claude Sonnet 4.5
Thinking mode
unverified87.52026Source ↗
10GPT-5.2
Pro, high reasoning
unverified87.42026Source ↗
11Kimi K2.5unverified87.12026Source ↗
12GPT-5unverified87.12026Source ↗
13GPT-5.1unverified872026Source ↗
14Grok 4unverified86.62026Source ↗
15DeepSeek V3.2
Thinking mode
unverified86.22026Source ↗
16Claude 3.7 Sonnet
Legacy reference — early 2025
unverified85.12026Source ↗
17DeepSeek-R1-0528unverified852026Source ↗
18Kimi K2-Thinking-0905unverified84.62026Source ↗
19GLM-4.5unverified84.62026Source ↗
20GPT-4o
Legacy reference — 2024
unverified72.62026Source ↗
Lineage

MMLU-Pro in context.

See full reasoning benchmarks lineage →
This benchmark (1)
active2024-06
MMLU-Pro
§ 04 · Submit a result

Add to the leaderboard.

← Back to Commonsense Reasoning