5 Benchmarks Tracked

Polish LLM Benchmarks
Real Performance Data

Compare language models on Polish benchmarks: PLCC, CPTU, MT-Bench-PL, EQ-Bench-PL, and Open PL LLM Leaderboard. All data live from the database.

Published Jan 1, 2025 · Updated Mar 30, 2026

Polish SOTA

PLCC
SOTA: Gemini-3.1-Pro-Preview97 (average)
CPTU-Bench
SOTA: Qwen/Qwen3.5-27B thinking (API)4.34 (average)
Open PL LLM
SOTA: Mistral-Large-Instruct-241169.84 (average)

Polish Benchmarks

Five benchmarks covering cultural knowledge, text understanding, general evaluation, conversation quality, and emotional intelligence in Polish.

Leaderboards

Top models on each Polish benchmark. Live from the database.

PLCC

Leaderboard — averageFull →
#ModelScore
1Gemini-3.1-Pro-Preview97
2Gemini-3.0-Pro-Preview95.83
3GPT-5.4-2026-03-05 (high reasoning)92.17
4Gemini-2.5-Pro-Preview-06-0592.17
5Gemini-3-Flash-Preview91.67
6GPT-5-Pro-2025-10-06 (high reasoning)91
7GPT-5.4-2026-03-05 (low reasoning)90.50
8Grok-490.50
9Gemini-2.5-Pro-Exp-03-2589.50
10GPT-5-2025-08-0789.50

CPTU-Bench

Leaderboard — averageFull →
#ModelScore
1Qwen/Qwen3.5-27B thinking (API)4.34
2gemini-2.0-flash-0014.29
3Qwen/Qwen3.5-27B non-thinking (API)4.27
4Qwen/Qwen3.5-35B-A3B thinking (API)4.22
5Qwen/Qwen3.5-35B-A3B non-thinking (API)4.18
6deepseek-ai/DeepSeek-V3.2 (API)4.14
7deepseek-ai/DeepSeek-R1 (API)4.14
8gemini-2.0-flash-lite-0014.09
9🚧DeepSeek-V3-03244.03
10deepseek-ai/DeepSeek-V3.1 (API)4.03

Open PL LLM Leaderboard

Leaderboard — averageFull →
#ModelScore
1Mistral-Large-Instruct-241169.84
2Meta-Llama-3.1-405B-Instruct-FP869.44
3Mistral-Large-Instruct-240769.11
4Qwen2.5-72B-Instruct67.92
5Qwen2.5-72B67.38
6QwQ-32B-Preview67.01
7Qwen2.5-32B66.73
8Llama-3.3-70B-Instruct66.40
9Qwen2-72B66.02
10remek/v3/rl-instruct/110k65.99

Polish MT-Bench

Leaderboard — pl-scoreFull →
#ModelScore
1gemma-3-27b-it9.28
2Mistral-Small-3.1-24B-Instruct-25039.18
3Phi-49.07
4gemma-3-12b-it8.97
5Qwen2.5-32B-Instruct8.86
6Qwen2-72B-Instruct8.78
7Mistral-Small-24B-Instruct-25018.72
8Mistral-Large-Instruct-24078.66
9Gemma-2-27b-it8.62
10aya-expanse-32b8.62

Polish EQ-Bench

Leaderboard — eq-scoreFull →
#ModelScore
1Mistral-Large-Instruct-240778.07
2Mistral-Large-Instruct-241177.29
3Meta-Llama-3.1-405B-Instruct-FP877.23
4GPT-4o-2024-08-0675.15
5gpt-4-turbo-2024-04-0974.59
6Bielik-11B-v2.6-Instruct73.70
7🚧DeepSeek-V3-032473.46
8Mistral-Small-Instruct-240972.85
9Llama-PLLuM-70B-chat72.56
10Meta-Llama-3.1-70B-Instruct72.53

Bielik Cross-Benchmark Tracker

All Bielik versions across 5 Polish benchmarks. Data live from the database.

ModelPLCCCPTU-BenchOpen PL LLM LeaderboardPolish MT-BenchPolish EQ-Bench
Bielik-0.1Polish46.67----
Bielik-1.5B-v1.0-DPO-001-L2Polish--80.48--
Bielik-1.5B-v1.0-DPO-001-L3Polish--16.38--
Bielik-1.5B-v1.0-DPO-001-L3-copyPolish--74.70--
Bielik-1.5B-v1.0-m3Polish--16.84--
Bielik-1.5B-v1.0-m3bPolish--74--
Bielik-1.5B-v1.0-m4Polish--74.50--
Bielik-1.5B-v3Polish--35.78--
Bielik-1.5B-v3.0-InstructPolish272.3841.36--
Bielik-1.5B-v3.0-Instruct-RC042025Polish--20.50--
Bielik-1.5B-v3.0-Instruct-SFT-RC042025Polish--71.11--
Bielik-11B-v2Polish--34.37--
Bielik-11B-v2.0-InstructPolish-2.2050.187.5668.24
Bielik-11B-v2.1-InstructPolish-3.9283.649.5060.07
Bielik-11B-v2.2-InstructPolish-3.7283.809.3569.05
Bielik-11B-v2.2-M-1.2Polish--92.32--
Bielik-11B-v2.3-InstructPolish-3.9786.298.9770.86
Bielik-11B-v2.3-Instruct-AWQPolish--69.58--
Bielik-11B-v2.3-Instruct-GPTQPolish--32.85--
Bielik-11B-v2.3-Instruct.IQ1_M.gguf.IQPolish--26.85--
Bielik-11B-v2.3-Instruct.IQ2_XXS.gguf.IQPolish--43.53--
Bielik-11B-v2.3-Instruct.IQ3_XXS.gguf.IQPolish--84.35--
Bielik-11B-v2.3-Instruct.Q4_K_M.ggufPolish--86.57--
Bielik-11B-v2.3-Instruct.Q4_K_M.gguf.IQPolish--91.13--
Bielik-11B-v2.3-Instruct.Q6_K.ggufPolish--86.57--
Bielik-11B-v2.3-Instruct.Q8_0.ggufPolish--51.82--
Bielik-11B-v2.4-Instruct-MSPolish--65.51--
Bielik-11B-v2.4-Instruct-SLPolish--65.87--
Bielik-11B-v2.4-Instruct-TIPolish--37.45--
Bielik-11B-v2.5-InstructPolish-2.9163.95-72.00
Bielik-11B-v2.5-Instruct-D-GRPO_H_070Polish--64.57--
Bielik-11B-v2.5-Instruct-GRPO_010Polish--67.61--
Bielik-11B-v2.5-Instruct-GRPO_020Polish--67.76--
Bielik-11B-v2.5-Instruct-GRPO_030Polish--84.35--
Bielik-11B-v2.5-Instruct-GRPO_040Polish--64.19--
Bielik-11B-v2.5-Instruct-GRPO_050Polish--61.50--
Bielik-11B-v2.5-Instruct-GRPO_060Polish--91.04--
Bielik-11B-v2.5-Instruct-GRPO_H_010Polish--35.35--
Bielik-11B-v2.5-Instruct-GRPO_H_030Polish--68.21--
Bielik-11B-v2.6-InstructPolish-3.4191.16-73.70
Bielik-11B-v3-Base-20250730Polish--77.56--
Bielik-11B-v3.0-InstructPolish783.7369.48-71.20
Bielik-11B-v3.0-Instruct-FP8-DynamicPolish--80--
Bielik-11B-v3.0-Instruct.Q4_K_M.ggufPolish--88.44--
Bielik-11B-v3.0-Instruct.Q6_K.ggufPolish--88.44--
Bielik-11B-v3.0-Instruct.Q8_0.ggufPolish--65.82--
Bielik-2.1Polish61----
Bielik-2.2Polish62----
Bielik-2.3Polish62.17----
Bielik-2.5Polish75----
Bielik-2.6Polish72----
Bielik-4.5B-v3Polish--87.08--
Bielik-4.5B-v3.0-InstructPolish352.4656.13-53.58
Bielik-4.5B-v3.0-Instruct-SFT-RC042025Polish--54.84--
Bielik-7B-Instruct-v0.1Polish-2.1630.436.1531.26
Bielik-7B-Instruct-v0.1-GPTQPolish--66.44--
Bielik-7B-v0.1Polish--20.85--
Bielik-Minitron-7B-v3.0-InstructPolish64----
Bielik-PL-11B-v3.0-InstructPolish--66.24--
Bielik-PL-Minitron-7B-v3.0-InstructPolish--81.99--
Bielik-SOLAR-LIKE-10.7B-Instruct-v0.1Polish--31.87-34.17
minitron-Bielik-7B-v3.0-Instruct-GGUF.Q4_K_M.ggufPolish--42.09--
minitron-Bielik-7B-v3.0-Instruct-GGUF.Q6_K.ggufPolish--87.33--
minitron-Bielik-7B-v3.0-Instruct-GGUF.Q8_0.ggufPolish--44--
MSH-Lite-7B-v1-Bielik-v2.3-Instruct-Llama-PrunePolish--19.53--
MSH-v1-Bielik-v2.3-Instruct-MedIT-mergePolish--50.62--
speakleash/Bielik-Minitron-7B-v3.0-InstructPolish-3.38---

About Bielik

Bielik (Polish for "White-tailed Eagle") is developed by SpeakLeash. These models are specifically optimized for Polish language tasks with a custom APT4 tokenizer and trained on 292B+ tokens of Polish text. Apache 2.0 licensed.

PLLuM Cross-Benchmark Tracker

All PLLuM versions across 5 Polish benchmarks. Data live from the database.

ModelPLCCCPTU-BenchOpen PL LLM LeaderboardPolish MT-BenchPolish EQ-Bench
CYFRAGOVPL/Llama-PLLuM-8B-instructPolish-3.46---
CYFRAGOVPL/PLLuM-12B-nc-chatPolish-2.62---
CYFRAGOVPL/PLLuM-12B-nc-instructPolish-3.31---
CYFRAGOVPL/pllum-12b-nc-instruct-250715Polish-3.29---
Llama-PLLuM-70B-chatPolish463.61-4.8072.56
Llama-PLLuM-70B-chat-250801Polish62----
Llama-PLLuM-70B-instructPolish-3.33--69.99
Llama-PLLuM-8B-chatPolish342.25-6.0546.20
PLLuM-12B-chatPolish333.14-9.3052.26
PLLuM-12B-instructPolish-3.09--36.21
PLLuM-12B-nc-chatPolish70--7.55-
pllum-12b-nc-chat-250715Polish-3.46--55.17
PLLuM-12B-nc-chat-250715Polish75----
PLLuM-8x7B-chatPolish443.44-7.1045.22
PLLuM-8x7B-instructPolish-3.01--39.55
PLLuM-8x7B-nc-chatPolish733.08-3.3547.29
PLLuM-8x7B-nc-instructPolish-3.22--41.75

About PLLuM

PLLuM (Polish Large Language Universal Model) is developed by OPI (National Information Processing Institute) as part of a government-backed initiative to build open Polish AI infrastructure. Models range from 8B to 70B parameters.

PLLuM Project Page →

Resources & Links

Explore More Benchmarks

See how Polish OCR models compare, or explore our broader LLM benchmark tracking.

Get notified when these results update

New models drop weekly. We track them so you don't have to.