Codesota · Models · Gemini 3 ProGoogle14 results · 12 benchmarks

Model card

Gemini 3 Pro.

GoogleapiUndisclosed params2 current SOTA

Google flagship model.

§ 01 · Benchmarks

Every benchmark Gemini 3 Pro has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	GPQA	Reasoning · Multi-step Reasoning	accuracy	91.9%	#1/33	—	source ↗
02	HLE	Reasoning · Multi-step Reasoning	accuracy	38.3%	#1/13	—	unverified
03	LiveCodeBench Pro	Computer Code · Code Generation	elo	2439.00	#1/9	—	source ↗
04	MMLU-Pro	Reasoning · Commonsense Reasoning	accuracy	89.8%	#2/20	2026-04-20	source ↗
05	SWE-bench Verified	Agentic AI · Autonomous Coding	pct_resolved	78.8%	#2/3	—	source ↗
06	MMMU-Pro	Multimodal · Visual Question Answering	accuracy	80.0%	#3/5	2026-01-15	source ↗
07	Tau2-Bench	Agentic AI · Tool Use	pass_rate	69.0%	#3/8	2025-11-18	source ↗
08	MMLU	Reasoning · Commonsense Reasoning	accuracy	91.4%	#6/41	2026-01-01	source ↗
09	OmniDocBench	Computer Vision · Document Parsing	composite	90.3%	#7/33	—	source ↗
10	SWE-Bench	Computer Code · Code Generation	resolve-rate-agentic	77.4%	#8/25	2026-01-01	source ↗
11	SWE-Bench	Computer Code · Code Generation	resolve-rate	77.4%	#10/32	2026-01-01	source ↗
12	SWE-Bench	Computer Code · Code Generation	resolve-rate-agentic	76.2%	#12/25	2025-12-01	unverified
13	SWE-Bench Verified	Computer Code · Code Generation	resolve-rate	76.2%	#12/39	—	source ↗
14	SWE-bench Verified	Agentic AI · SWE-bench	resolve-rate	76.2%	#19/81	—	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 02 · Strengths by area

Where Gemini 3 Pro actually performs.

Reasoning

benchmarks

avg rank #2.5 · 2 SOTA

§ 03 · Papers

1 paper with results for Gemini 3 Pro.

2023-10-10· Computer Code· 1 result
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao et al.

§ 04 · Related models

Other Google models scored on Codesota.

632M params · 2 results · 1 SOTA

CoCa (finetuned)

2.1B params · 1 result · 1 SOTA

Gemini 2.0 Flash

1 result · 1 SOTA

Gemini 3.1 Pro Preview

1 result · 1 SOTA

Noise2Music

Unknown params · 1 result · 1 SOTA

§ 05 · Sources & freshness

Where these numbers come from.

editorial

results

google-blog

results

livecodebench-pro-official

result

pricepertoken

result

artificialanalysis.ai

result

codesota-shadow-mmlu

result

paddleocr-paper

result

live-swe-agent

result

swebench-leaderboard

result

google-internal

result

6 of 14 rows marked verified. · first result 2025-11-18, latest 2026-04-20.

Gemini 3 Pro.

Every benchmark Gemini 3 Pro has a recorded score for.

Where Gemini 3 Pro actually performs.

1 paper with results for Gemini 3 Pro.

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Other Google models scored on Codesota.

Where these numbers come from.