Codesota · Benchmark · MMLU-ProHome/Leaderboards/MMLU-Pro
Unknown

MMLU-Pro.

The MMLU-Pro dataset contains 12K complex questions across various disciplines, including biology, business, chemistry, computer science, economics, engineering, math, physics, and psychology. It has 10 options per question, compared to the original MMLU's 4, making it more challenging. It also integrates more reasoning-focused problems, where Chain-of-Thought (CoT) results can be significantly higher than Perplexity (PPL).

Paper Leaderboard Lineage
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for MMLU-Pro. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

RankModelTrustScoreYearLinksFix
01MiniMax M2.1
MiniMaxAI/MiniMax-M2.1
vendor88N/ACode ↗Source ↗Looks wrong?
02Intern S2 Preview
internlm/Intern-S2-Preview
vendor88N/ACode ↗Looks wrong?
03Qwen3.5 397B A17B
Qwen/Qwen3.5-397B-A17B
vendor87.8N/ACode ↗Looks wrong?
04DeepSeek V4 Pro
deepseek-ai/DeepSeek-V4-Pro
vendor87.5N/ACode ↗Looks wrong?
05Kimi K2.5
moonshotai/Kimi-K2.5
vendor87.1N/ACode ↗Looks wrong?
06NVIDIA Nemotron 3 Ultra 550B A55B BF16
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
vendor86.8N/ACode ↗Looks wrong?
07NVIDIA Nemotron 3 Ultra 550B A55B NVFP4
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
vendor86.8N/ACode ↗Looks wrong?
08Qwen3.5 122B A10B
Qwen/Qwen3.5-122B-A10B
vendor86.7N/ACode ↗Source ↗Looks wrong?
09DeepSeek V4 Flash
deepseek-ai/DeepSeek-V4-Flash
vendor86.4N/ACode ↗Looks wrong?
10Qwen3.6 27B
Qwen/Qwen3.6-27B
vendor86.2N/ACode ↗Looks wrong?
11Qwen3.5 27B
Qwen/Qwen3.5-27B
vendor86.1N/ACode ↗Source ↗Looks wrong?
12GLM 5
zai-org/GLM-5
vendor86N/ACode ↗Source ↗Looks wrong?
13Qwen3.6 35B A3B
Qwen/Qwen3.6-35B-A3B
vendor85.2N/ACode ↗Looks wrong?
14DeepSeek R1 0528
deepseek-ai/DeepSeek-R1-0528
vendor85N/ACode ↗Looks wrong?
15GLM 4.5
zai-org/GLM-4.5
vendor84.6N/ACode ↗Source ↗Looks wrong?
16Step 3.5 Flash
stepfun-ai/Step-3.5-Flash
vendor84.4N/ACode ↗Source ↗Looks wrong?
17DeepSeek R1
deepseek-ai/DeepSeek-R1
vendor84N/ACode ↗Looks wrong?
18K EXAONE 236B A23B
LGAI-EXAONE/K-EXAONE-236B-A23B
vendor83.8N/ACode ↗Looks wrong?
19NVIDIA Nemotron 3 Super 120B A12B BF16
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
vendor83.73N/ACode ↗Looks wrong?
20Intern S1
internlm/Intern-S1
vendor83.5N/ACode ↗Source ↗Looks wrong?
21EXAONE 4.5 33B
LGAI-EXAONE/EXAONE-4.5-33B
vendor83.3N/ACode ↗Looks wrong?
22Qwen3 235B A22B Instruct 2507
Qwen/Qwen3-235B-A22B-Instruct-2507
vendor83N/ACode ↗Source ↗Looks wrong?
23Seed OSS 36B Instruct
ByteDance-Seed/Seed-OSS-36B-Instruct
vendor82.7N/ACode ↗Source ↗Looks wrong?
24LongCat Flash Chat
meituan-longcat/LongCat-Flash-Chat
vendor82.7N/ACode ↗Source ↗Looks wrong?
25MiniMax M2
MiniMaxAI/MiniMax-M2
vendor82N/ACode ↗Looks wrong?
26GLM 4.5 Air
zai-org/GLM-4.5-Air
vendor81.4N/ACode ↗Source ↗Looks wrong?
27DeepSeek V3 0324
deepseek-ai/DeepSeek-V3-0324
vendor81.2N/ACode ↗Looks wrong?
28MiniMax M1 40k
MiniMaxAI/MiniMax-M1-40k
vendor81.1N/ACode ↗Source ↗Looks wrong?
29JoyAI LLM Flash
jdopensource/JoyAI-LLM-Flash
vendor81.02N/ACode ↗Looks wrong?
30Kimi K2 Instruct
moonshotai/Kimi-K2-Instruct
vendor81N/ACode ↗Source ↗Looks wrong?
31Qwen3 30B A3B Thinking 2507
Qwen/Qwen3-30B-A3B-Thinking-2507
vendor80.9N/ACode ↗Source ↗Looks wrong?
32gpt oss 120b
openai/gpt-oss-120b
vendor80.8N/ACode ↗Source ↗Looks wrong?
33MiniMax M2.5
MiniMaxAI/MiniMax-M2.5
vendor80.1N/ACode ↗Source ↗Looks wrong?
34ERNIE 4.5 300B A47B PT
baidu/ERNIE-4.5-300B-A47B-PT
vendor78.4N/ACode ↗Source ↗Looks wrong?
35LongCat Flash Lite
meituan-longcat/LongCat-Flash-Lite
vendor78.29N/ACode ↗Looks wrong?
36MiniMax Text 01
MiniMaxAI/MiniMax-Text-01
vendor75.7N/ACode ↗Source ↗Looks wrong?
37gpt oss 20b
openai/gpt-oss-20b
vendor73.6N/ACode ↗Source ↗Looks wrong?
38GPT-4o
Original MMLU-Pro paper, 5-shot CoT
paper72.62024Paper ↗Looks wrong?
39Qwen2.5 72B
Qwen/Qwen2.5-72B
vendor71.59N/ACode ↗Source ↗Looks wrong?
40phi 4
microsoft/phi-4
vendor70.4N/ACode ↗Source ↗Looks wrong?
41Qwen3 4B Instruct 2507
Qwen/Qwen3-4B-Instruct-2507
vendor69.6N/ACode ↗Looks wrong?
42ERNIE 4.5 300B A47B Base PT
baidu/ERNIE-4.5-300B-A47B-Base-PT
vendor69.5N/ACode ↗Source ↗Looks wrong?
43Qwen2.5 32B
Qwen/Qwen2.5-32B
vendor69.23N/ACode ↗Source ↗Looks wrong?
44Gemini 1.5 Pro
Original MMLU-Pro paper, 5-shot CoT
paper692024Paper ↗Looks wrong?
45MiMo V2.5 Pro
XiaomiMiMo/MiMo-V2.5-Pro
vendor68.5N/ACode ↗Looks wrong?
46Claude 3 Opus
Original MMLU-Pro paper, 5-shot CoT
paper68.52024Paper ↗Looks wrong?
47Qwen3 235B A22B
Qwen/Qwen3-235B-A22B
vendor68.18N/ACode ↗Source ↗Looks wrong?
48Mistral Large Instruct 2411
mistralai/Mistral-Large-Instruct-2411
vendor67.94N/ACode ↗Source ↗Looks wrong?
49Hunyuan A13B Instruct
tencent/Hunyuan-A13B-Instruct
vendor67.3N/ACode ↗Source ↗Looks wrong?
50Mistral Large Instruct 2407
mistralai/Mistral-Large-Instruct-2407
vendor65.91N/ACode ↗Source ↗Looks wrong?
51DeepSeek V2.5
deepseek-ai/DeepSeek-V2.5
vendor65.83N/ACode ↗Source ↗Looks wrong?
52Seed OSS 36B Base
ByteDance-Seed/Seed-OSS-36B-Base
vendor65.1N/ACode ↗Source ↗Looks wrong?
53NVIDIA Nemotron 3 Nano 30B A3B Base BF16
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
vendor65.1N/ACode ↗Source ↗Looks wrong?
54DeepSeek V3
deepseek-ai/DeepSeek-V3
vendor64.4N/ACode ↗Looks wrong?
55granite 4.1 30b
ibm-granite/granite-4.1-30b
vendor64.09N/ACode ↗Looks wrong?
56GPT-4-Turbo
Original MMLU-Pro paper, 5-shot CoT
paper63.72024Paper ↗Looks wrong?
57Qwen2.5 14B
Qwen/Qwen2.5-14B
vendor63.69N/ACode ↗Source ↗Looks wrong?
58Qwen3 30B A3B Base
Qwen/Qwen3-30B-A3B-Base
vendor61.7N/ACode ↗Source ↗Looks wrong?
59Llama 3.1 405B
meta-llama/Llama-3.1-405B
vendor61.6N/ACode ↗Source ↗Looks wrong?
60Nemotron H 56B Base 8K
nvidia/Nemotron-H-56B-Base-8K
vendor60.5N/ACode ↗Source ↗Looks wrong?
61Seed OSS 36B Base woSyn
ByteDance-Seed/Seed-OSS-36B-Base-woSyn
vendor60.4N/ACode ↗Source ↗Looks wrong?
62Tencent Hunyuan Large
tencent/Tencent-Hunyuan-Large
vendor60.2N/ACode ↗Source ↗Looks wrong?
63Mellum2 12B A2.5B Base Pretrain
JetBrains/Mellum2-12B-A2.5B-Base-Pretrain
vendor59.31N/ACode ↗Looks wrong?
64Mellum2 12B A2.5B Base
JetBrains/Mellum2-12B-A2.5B-Base
vendor59.31N/ACode ↗Looks wrong?
65Gemini 1.5 Flash
Original MMLU-Pro paper, 5-shot CoT
paper59.12024Paper ↗Looks wrong?
66EXAONE 3.5 32B Instruct
LGAI-EXAONE/EXAONE-3.5-32B-Instruct
vendor58.91N/ACode ↗Source ↗Looks wrong?
67MiMo 7B RL
XiaomiMiMo/MiMo-7B-RL
vendor58.6N/ACode ↗Source ↗Looks wrong?
68internlm3 8b instruct
internlm/internlm3-8b-instruct
vendor57.6N/ACode ↗Source ↗Looks wrong?
69ERNIE 4.5 21B A3B Base PT
baidu/ERNIE-4.5-21B-A3B-Base-PT
vendor56.7N/ACode ↗Source ↗Looks wrong?
70Llama 3 70B Instruct
Original MMLU-Pro paper, 5-shot CoT
paper56.22024Paper ↗Looks wrong?
71granite 4.1 8b
ibm-granite/granite-4.1-8b
vendor55.99N/ACode ↗Looks wrong?
72Phi 3 medium 4k instruct
microsoft/Phi-3-medium-4k-instruct
vendor55.7N/ACode ↗Source ↗Looks wrong?
73DeepSeek V2 Chat
deepseek-ai/DeepSeek-V2-Chat
vendor54.81N/ACode ↗Source ↗Looks wrong?
74Mistral Small 24B Base 2501
mistralai/Mistral-Small-24B-Base-2501
vendor54.4N/ACode ↗Source ↗Looks wrong?
75Phi 4 mini instruct
microsoft/Phi-4-mini-instruct
vendor52.8N/ACode ↗Source ↗Looks wrong?
76Meta Llama 3 70B
meta-llama/Meta-Llama-3-70B
vendor52.78N/ACode ↗Source ↗Looks wrong?
77Llama 3.1 70B
meta-llama/Llama-3.1-70B
vendor52.47N/ACode ↗Source ↗Looks wrong?
78Yi 1.5 34B Chat
01-ai/Yi-1.5-34B-Chat
vendor52.29N/ACode ↗Source ↗Looks wrong?
79Phi 3 medium 128k instruct
microsoft/Phi-3-medium-128k-instruct
vendor51.91N/ACode ↗Source ↗Looks wrong?
80MAmmoTH2 8x7B Plus
TIGER-Lab/MAmmoTH2-8x7B-Plus
vendor50.4N/ACode ↗Source ↗Looks wrong?
81Qwen1.5 110B
Qwen/Qwen1.5-110B
vendor49.93N/ACode ↗Source ↗Looks wrong?
82granite 4.1 3b
ibm-granite/granite-4.1-3b
vendor49.83N/ACode ↗Looks wrong?
83AI21 Jamba Large 1.5
ai21labs/AI21-Jamba-Large-1.5
vendor49.46N/ACode ↗Source ↗Looks wrong?
84Mistral Small Instruct 2409
mistralai/Mistral-Small-Instruct-2409
vendor48.4N/ACode ↗Source ↗Looks wrong?
85glm 4 9b
zai-org/glm-4-9b
vendor47.92N/ACode ↗Source ↗Looks wrong?
86Phi 3.5 mini instruct
microsoft/Phi-3.5-mini-instruct
vendor47.87N/ACode ↗Source ↗Looks wrong?
87EXAONE 3.5 7.8B Instruct
LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
vendor46.24N/ACode ↗Source ↗Looks wrong?
88Yi 1.5 9B Chat
01-ai/Yi-1.5-9B-Chat
vendor45.95N/ACode ↗Source ↗Looks wrong?
89Phi 3 mini 4k instruct
microsoft/Phi-3-mini-4k-instruct
vendor45.66N/ACode ↗Source ↗Looks wrong?
90aya expanse 32b
CohereLabs/aya-expanse-32b
vendor45.41N/ACode ↗Source ↗Looks wrong?
91gemma 2 9b
google/gemma-2-9b
vendor45.1N/ACode ↗Source ↗Looks wrong?
92Qwen2.5 7B
Qwen/Qwen2.5-7B
vendor45N/ACode ↗Source ↗Looks wrong?
93Phi 3 mini 128k instruct
microsoft/Phi-3-mini-128k-instruct
vendor43.86N/ACode ↗Source ↗Looks wrong?
94Qwen2.5 3B
Qwen/Qwen2.5-3B
vendor43.73N/ACode ↗Source ↗Looks wrong?
95MAmmoTH2 8B Plus
TIGER-Lab/MAmmoTH2-8B-Plus
vendor43.35N/ACode ↗Source ↗Looks wrong?
96Yi 34B
01-ai/Yi-34B
vendor43.03N/ACode ↗Source ↗Looks wrong?
97Mathstral 7B v0.1
mistralai/Mathstral-7B-v0.1
vendor42N/ACode ↗Source ↗Looks wrong?
98MiMo 7B Base
XiaomiMiMo/MiMo-7B-Base
vendor41.9N/ACode ↗Source ↗Looks wrong?
99DeepSeek Coder V2 Lite Instruct
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
vendor41.57N/ACode ↗Source ↗Looks wrong?
100Mixtral 8x7B v0.1
mistralai/Mixtral-8x7B-v0.1
vendor41.03N/ACode ↗Source ↗Looks wrong?
101Meta Llama 3 8B Instruct
meta-llama/Meta-Llama-3-8B-Instruct
vendor40.98N/ACode ↗Source ↗Looks wrong?
102MAmmoTH2 7B Plus
TIGER-Lab/MAmmoTH2-7B-Plus
vendor40.85N/ACode ↗Source ↗Looks wrong?
103Qwen2 7B
Qwen/Qwen2-7B
vendor40.73N/ACode ↗Source ↗Looks wrong?
104Mistral Nemo Base 2407
mistralai/Mistral-Nemo-Base-2407
vendor39.77N/ACode ↗Source ↗Looks wrong?
105EXAONE 3.5 2.4B Instruct
LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
vendor39.1N/ACode ↗Source ↗Looks wrong?
106Yi 1.5 6B Chat
01-ai/Yi-1.5-6B-Chat
vendor38.23N/ACode ↗Source ↗Looks wrong?
107Qwen1.5 14B Chat
Qwen/Qwen1.5-14B-Chat
vendor38.02N/ACode ↗Source ↗Looks wrong?
108Ministral 8B Instruct 2410
mistralai/Ministral-8B-Instruct-2410
vendor37.93N/ACode ↗Source ↗Looks wrong?
109c4ai command r v01
CohereLabs/c4ai-command-r-v01
vendor37.9N/ACode ↗Source ↗Looks wrong?
110internlm2 math plus 20b
internlm/internlm2-math-plus-20b
vendor37.1N/ACode ↗Source ↗Looks wrong?
111LLaDA 8B Instruct
GSAI-ML/LLaDA-8B-Instruct
vendor37N/ACode ↗Source ↗Looks wrong?
112Llama 3 Smaug 8B
abacusai/Llama-3-Smaug-8B
vendor36.93N/ACode ↗Source ↗Looks wrong?
113Llama 3.1 8B
meta-llama/Llama-3.1-8B
vendor36.6N/ACode ↗Source ↗Looks wrong?
114Meta Llama 3 8B
meta-llama/Meta-Llama-3-8B
vendor35.36N/ACode ↗Source ↗Looks wrong?
115deepseek math 7b instruct
deepseek-ai/deepseek-math-7b-instruct
vendor35.3N/ACode ↗Source ↗Looks wrong?
116DeepSeek Coder V2 Lite Base
deepseek-ai/DeepSeek-Coder-V2-Lite-Base
vendor34.37N/ACode ↗Source ↗Looks wrong?
117aya expanse 8b
CohereLabs/aya-expanse-8b
vendor33.74N/ACode ↗Source ↗Looks wrong?
118internlm2 math plus 7b
internlm/internlm2-math-plus-7b
vendor33.5N/ACode ↗Source ↗Looks wrong?
119granite 3.1 8b base
ibm-granite/granite-3.1-8b-base
vendor33.08N/ACode ↗Source ↗Looks wrong?
120Qwen2.5 1.5B
Qwen/Qwen2.5-1.5B
vendor32.1N/ACode ↗Source ↗Looks wrong?
121granite 3.0 8b base
ibm-granite/granite-3.0-8b-base
vendor31.03N/ACode ↗Source ↗Looks wrong?
122Mistral 7B Instruct v0.2
mistralai/Mistral-7B-Instruct-v0.2
vendor30.84N/ACode ↗Source ↗Looks wrong?
123Mistral 7B v0.2
mistral-community/Mistral-7B-v0.2
vendor30.43N/ACode ↗Source ↗Looks wrong?
124Qwen1.5 7B Chat
Qwen/Qwen1.5-7B-Chat
vendor29.06N/ACode ↗Source ↗Looks wrong?
125Yi 6B Chat
01-ai/Yi-6B-Chat
vendor28.84N/ACode ↗Source ↗Looks wrong?
126Yi 6B
01-ai/Yi-6B
vendor26.51N/ACode ↗Source ↗Looks wrong?
127granite 3.1 2b base
ibm-granite/granite-3.1-2b-base
vendor23.89N/ACode ↗Source ↗Looks wrong?
128llemma 7b
EleutherAI/llemma_7b
vendor23.45N/ACode ↗Source ↗Looks wrong?
129Qwen2 1.5B Instruct
Qwen/Qwen2-1.5B-Instruct
vendor22.62N/ACode ↗Source ↗Looks wrong?
130Qwen2 1.5B
Qwen/Qwen2-1.5B
vendor22.56N/ACode ↗Source ↗Looks wrong?
131Llama 3.2 3B
meta-llama/Llama-3.2-3B
vendor22.17N/ACode ↗Source ↗Looks wrong?
132granite 3.0 2b base
ibm-granite/granite-3.0-2b-base
vendor21.72N/ACode ↗Source ↗Looks wrong?
133granite 3.1 3b a800m base
ibm-granite/granite-3.1-3b-a800m-base
vendor20.39N/ACode ↗Source ↗Looks wrong?
134SmolLM2 1.7B
HuggingFaceTB/SmolLM2-1.7B
vendor18.31N/ACode ↗Source ↗Looks wrong?
135Qwen2 0.5B
Qwen/Qwen2-0.5B
vendor14.97N/ACode ↗Source ↗Looks wrong?
136Qwen2.5 0.5B
Qwen/Qwen2.5-0.5B
vendor14.92N/ACode ↗Source ↗Looks wrong?
137granite 3.1 1b a400m base
ibm-granite/granite-3.1-1b-a400m-base
vendor12.34N/ACode ↗Source ↗Looks wrong?
138Llama 3.2 1B
meta-llama/Llama-3.2-1B
vendor11.95N/ACode ↗Source ↗Looks wrong?
139SmolLM2 360M
HuggingFaceTB/SmolLM2-360M
vendor11.38N/ACode ↗Source ↗Looks wrong?
140SmolLM2 135M
HuggingFaceTB/SmolLM2-135M
vendor10.85N/ACode ↗Source ↗Looks wrong?
141Qwen2.5 VL 72B Instruct
Qwen/Qwen2.5-VL-72B-Instruct
vendor0.65N/ACode ↗Source ↗Looks wrong?
Lineage

MMLU-Pro in context.

See full reasoning benchmarks lineage →
This benchmark (1)
active2024-06
MMLU-Pro
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards