Codesota · Models · phi-4microsoft13 results · 3 benchmarks
Model card

phi-4.

microsoftopen-source14.7B params
§ 01 · Benchmarks

Every benchmark phi-4 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalbelebele91.6%#17/490source ↗
02Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalcbd37.3%#29/490source ↗
03Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalpolemo2-in86.0%#34/490source ↗
04Polish EQ-BenchNatural Language Processing · Polish Emotional Intelligenceeq-score59.1%#34/101source ↗
05CPTU-BenchNatural Language Processing · Polish Text Understandingaverage3.3%#47/93source ↗
06CPTU-BenchNatural Language Processing · Polish Text Understandingsentiment3.7%#48/93source ↗
07CPTU-BenchNatural Language Processing · Polish Text Understandingtricky-questions2.7%#48/93source ↗
08CPTU-BenchNatural Language Processing · Polish Text Understandinglanguage-understanding3.5%#48/93source ↗
09CPTU-BenchNatural Language Processing · Polish Text Understandingphraseology3.2%#52/93source ↗
10Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalaverage62.6%#55/491source ↗
11Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generaldyk66.1%#66/489source ↗
12Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalppc76.3%#91/490source ↗
13Open PL LLM LeaderboardNatural Language Processing · Polish LLM Generalpolqa-open-book84.5%#217/489source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where phi-4 actually performs.

Natural Language Processing
3
benchmarks
avg rank #60.5
§ 04 · Related models

Other microsoft models scored on Codesota.

Phi-3-medium-4k-instruct
0 results
Phi-3-mini-4k-instruct
0 results
Phi-3-small-8k-instruct
0 results
Phi-3.5-MoE-instruct
0 results
Phi-3.5-mini-instruct
3.82B params · 0 results
Phi-4-mini-instruct
3.84B params · 0 results
🚧WizardLM-2-7B
0 results
§ 05 · Sources & freshness

Where these numbers come from.

speakleash/open_pl_llm_leaderboard
7
results
SpeakLeash/CPTU-Bench
5
results
SpeakLeash/Polish-EQ-Bench
1
result
13 of 13 rows marked verified.