Codesota · Models · Llama-3.3-70B-Instructmeta-llama15 results · 4 benchmarks
Model card

Llama-3.3-70B-Instruct.

meta-llamaopen-source70.6B params
§ 01 · Benchmarks

Every benchmark Llama-3.3-70B-Instruct has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalbelebele92.6%#4/490source ↗
02Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalaverage66.4%#8/491source ↗
03Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generaldyk71.5%#8/489source ↗
04Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalpolemo2-in87.1%#12/490source ↗
05CPTU-BenchNatural Language Processing · Polish Text Understandingsentiment4.3%#13/93source ↗
06Polish EQ-BenchNatural Language Processing · Polish Emotional Intelligenceeq-score70.7%#18/101source ↗
07Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalppc79.9%#19/490source ↗
08Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generaleq-bench61.8%#21/299source ↗
09HumanEvalComputer Code · Code Generationpass@188.4%#22/422024-12-01source ↗
10CPTU-BenchNatural Language Processing · Polish Text Understandinglanguage-understanding3.9%#28/93source ↗
11CPTU-BenchNatural Language Processing · Polish Text Understandingtricky-questions3.4%#30/93source ↗
12CPTU-BenchNatural Language Processing · Polish Text Understandingaverage3.6%#31/93source ↗
13Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalcbd36.7%#33/490source ↗
14CPTU-BenchNatural Language Processing · Polish Text Understandingphraseology3.0%#62/93source ↗
15Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalpolqa-open-book88.6%#92/489source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where Llama-3.3-70B-Instruct actually performs.

Computer Code
1
benchmark
avg rank #22.0
Natural Language Processing
3
benchmarks
avg rank #27.1
§ 04 · Related models

Other meta-llama models scored on Codesota.

Llama-2-7b-chat-hf
0 results
Llama-2-7b-hf
0 results
Llama-3.2-1B
0 results
Llama-3.2-1B-Instruct
1.24B params · 0 results
Llama-3.2-3B
0 results
Llama-3.2-3B-Instruct
3.21B params · 0 results
Llama-4-Scout-17B-16E
0 results
Llama-4-Scout-17B-16E-Instruct
109B params · 0 results
§ 05 · Sources & freshness

Where these numbers come from.

speakleash/open_pl_llm_leaderboard
8
results
SpeakLeash/CPTU-Bench
5
results
SpeakLeash/Polish-EQ-Bench
1
result
shadow-page-humaneval
1
result
15 of 15 rows marked verified.