Codesota · Models · GPT-5OpenAI10 results · 10 benchmarks
Model card

GPT-5.

OpenAIapi
§ 01 · Benchmarks

Every benchmark GPT-5 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01GSM8KReasoning · Mathematical Reasoningaccuracy99.2%#2/322025-08-01source ↗
02HLEReasoning · Multi-step Reasoningaccuracy25.3%#2/13unverified
03LiveCodeBench ProComputer Code · Code Generationelo2176.00#2/9source ↗
04LiveCodeBenchComputer Code · Code Generationpass@185.0%#3/30source ↗
05HumanEvalComputer Code · Code Generationpass@195.1%#4/422025-12-01source ↗
06GPQAReasoning · Multi-step Reasoningaccuracy89.0%#5/33source ↗
07MMLUReasoning · Commonsense Reasoningaccuracy90.8%#8/412025-09-01source ↗
08MMLU-ProReasoning · Commonsense Reasoningaccuracy87.1%#11/202026-04-20source ↗
09SWE-Bench VerifiedComputer Code · Code Generationresolve-rate74.9%#13/39source ↗
10SWE-bench VerifiedAgentic AI · SWE-benchresolve-rate74.9%#20/81source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where GPT-5 actually performs.

Computer Code
4
benchmarks
avg rank #5.5
Reasoning
5
benchmarks
avg rank #5.6
Agentic AI
1
benchmark
avg rank #20.0
§ 04 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 35 results · 9 SOTA
o3
16 results · 5 SOTA
o4-mini
13 results · 3 SOTA
o3 (high)
2 results · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
11 results
o1-preview
Undisclosed params · 8 results
GPT-4.1
7 results
§ 05 · Sources & freshness

Where these numbers come from.

editorial
2
results
gsm8k-shadow-page-timeline
1
result
livecodebench-pro-official
1
result
artificial-analysis
1
result
shadow-page-humaneval
1
result
openai-gpt-5-launch
1
result
codesota-shadow-mmlu
1
result
pricepertoken
1
result
openai-blog
1
result
3 of 10 rows marked verified. · first result 2025-08-01, latest 2026-04-20.