Codesota · Models · DeepSeek-V3DeepSeek15 results · 9 benchmarks
Model card

DeepSeek-V3.

DeepSeekopen-sourceLLM

DeepSeek's V3 model.

§ 01 · Benchmarks

Every benchmark DeepSeek-V3 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01MBPPComputer Code · Code Generationpass@189.3%#9/19source ↗
02MATHReasoning · Mathematical Reasoningaccuracy90.2%#17/34source ↗
03GSM8KReasoning · Mathematical Reasoningaccuracy95.8%#18/32source ↗
04LiveCodeBenchComputer Code · Code Generationpass@149.2%#19/302024-03-12source ↗
05MMLUReasoning · Commonsense Reasoningaccuracy88.5%#22/41source ↗
06HumanEvalComputer Code · Code Generationpass@182.6%#32/42source ↗
07SWE-Bench VerifiedComputer Code · Code Generationresolve-rate42.0%#36/39source ↗
08PLCCNatural Language Processing · Polish Cultural Competencyvocabulary63.0%#59/165source ↗
09PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition73.0%#60/165source ↗
10PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment61.0%#61/165source ↗
11PLCCNatural Language Processing · Polish Cultural Competencygeography79.0%#61/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencyaverage69.2%#67/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencyhistory77.0%#69/165source ↗
14SWE-bench VerifiedAgentic AI · SWE-benchresolve-rate42.0%#70/81source ↗
15PLCCNatural Language Processing · Polish Cultural Competencygrammar62.0%#77/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where DeepSeek-V3 actually performs.

Reasoning
3
benchmarks
avg rank #19.0
Computer Code
4
benchmarks
avg rank #24.0
Natural Language Processing
1
benchmark
avg rank #64.9
Agentic AI
1
benchmark
avg rank #70.0
§ 03 · Papers

1 paper with results for DeepSeek-V3.

  1. 2024-03-12· Computer Code· 1 result

    LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

§ 04 · Related models

Other DeepSeek models scored on Codesota.

DeepSeek R1
671B MoE params · 10 results
DeepSeek-Coder-V2-Instruct
Unknown params · 4 results
DeepSeek-OCR
3 results
DeepSeek-R1-0528
3 results
DeepSeek V3.5
685B MoE params · 2 results
DeepSeek-V2.5
2 results
DeepSeek-V3.1
2 results
DeepSeek V3.2
1 result
§ 05 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
arxiv
2
results
openai-simple-evals
2
results
deepseek-paper
1
result
official-leaderboard
1
result
swebench-leaderboard
1
result
editorial
1
result
11 of 15 rows marked verified.