Polish LLM Benchmarks
Real Performance Data
Compare language models on Polish benchmarks: PLCC, CPTU, MT-Bench-PL, EQ-Bench-PL, and Open PL LLM Leaderboard. All data live from the database.
Published Jan 1, 2025 · Updated Mar 30, 2026
Polish SOTA
Polish Benchmarks
Five benchmarks covering cultural knowledge, text understanding, general evaluation, conversation quality, and emotional intelligence in Polish.
PLCC
Polish Linguistic and Cultural Competency — tests grammar, idioms, and cultural references.
CPTU-Bench
Complex Polish Text Understanding — measures comprehension of nuanced, multi-layered Polish texts.
Open PL LLM Leaderboard
Multi-task Polish evaluation covering reasoning, knowledge, and language understanding.
Polish MT-Bench
Multi-turn conversation quality — tests dialogue coherence and context retention in Polish.
Polish EQ-Bench
Emotional intelligence in Polish — evaluates understanding of emotions and social nuances.
Leaderboards
Top models on each Polish benchmark. Live from the database.
PLCC
| # | Model | Score |
|---|---|---|
| 1 | Gemini-3.1-Pro-Preview | 97 |
| 2 | Gemini-3.0-Pro-Preview | 95.83 |
| 3 | GPT-5.4-2026-03-05 (high reasoning) | 92.17 |
| 4 | Gemini-2.5-Pro-Preview-06-05 | 92.17 |
| 5 | Gemini-3-Flash-Preview | 91.67 |
| 6 | GPT-5-Pro-2025-10-06 (high reasoning) | 91 |
| 7 | GPT-5.4-2026-03-05 (low reasoning) | 90.50 |
| 8 | Grok-4 | 90.50 |
| 9 | Gemini-2.5-Pro-Exp-03-25 | 89.50 |
| 10 | GPT-5-2025-08-07 | 89.50 |
CPTU-Bench
| # | Model | Score |
|---|---|---|
| 1 | Qwen/Qwen3.5-27B thinking (API) | 4.34 |
| 2 | gemini-2.0-flash-001 | 4.29 |
| 3 | Qwen/Qwen3.5-27B non-thinking (API) | 4.27 |
| 4 | Qwen/Qwen3.5-35B-A3B thinking (API) | 4.22 |
| 5 | Qwen/Qwen3.5-35B-A3B non-thinking (API) | 4.18 |
| 6 | deepseek-ai/DeepSeek-V3.2 (API) | 4.14 |
| 7 | deepseek-ai/DeepSeek-R1 (API) | 4.14 |
| 8 | gemini-2.0-flash-lite-001 | 4.09 |
| 9 | 🚧DeepSeek-V3-0324 | 4.03 |
| 10 | deepseek-ai/DeepSeek-V3.1 (API) | 4.03 |
Open PL LLM Leaderboard
| # | Model | Score |
|---|---|---|
| 1 | Mistral-Large-Instruct-2411 | 69.84 |
| 2 | Meta-Llama-3.1-405B-Instruct-FP8 | 69.44 |
| 3 | Mistral-Large-Instruct-2407 | 69.11 |
| 4 | Qwen2.5-72B-Instruct | 67.92 |
| 5 | Qwen2.5-72B | 67.38 |
| 6 | QwQ-32B-Preview | 67.01 |
| 7 | Qwen2.5-32B | 66.73 |
| 8 | Llama-3.3-70B-Instruct | 66.40 |
| 9 | Qwen2-72B | 66.02 |
| 10 | remek/v3/rl-instruct/110k | 65.99 |
Polish MT-Bench
| # | Model | Score |
|---|---|---|
| 1 | gemma-3-27b-it | 9.28 |
| 2 | Mistral-Small-3.1-24B-Instruct-2503 | 9.18 |
| 3 | Phi-4 | 9.07 |
| 4 | gemma-3-12b-it | 8.97 |
| 5 | Qwen2.5-32B-Instruct | 8.86 |
| 6 | Qwen2-72B-Instruct | 8.78 |
| 7 | Mistral-Small-24B-Instruct-2501 | 8.72 |
| 8 | Mistral-Large-Instruct-2407 | 8.66 |
| 9 | Gemma-2-27b-it | 8.62 |
| 10 | aya-expanse-32b | 8.62 |
Polish EQ-Bench
| # | Model | Score |
|---|---|---|
| 1 | Mistral-Large-Instruct-2407 | 78.07 |
| 2 | Mistral-Large-Instruct-2411 | 77.29 |
| 3 | Meta-Llama-3.1-405B-Instruct-FP8 | 77.23 |
| 4 | GPT-4o-2024-08-06 | 75.15 |
| 5 | gpt-4-turbo-2024-04-09 | 74.59 |
| 6 | Bielik-11B-v2.6-Instruct | 73.70 |
| 7 | 🚧DeepSeek-V3-0324 | 73.46 |
| 8 | Mistral-Small-Instruct-2409 | 72.85 |
| 9 | Llama-PLLuM-70B-chat | 72.56 |
| 10 | Meta-Llama-3.1-70B-Instruct | 72.53 |
Bielik Cross-Benchmark Tracker
All Bielik versions across 5 Polish benchmarks. Data live from the database.
| Model | PLCC | CPTU-Bench | Open PL LLM Leaderboard | Polish MT-Bench | Polish EQ-Bench |
|---|---|---|---|---|---|
| Bielik-0.1Polish | 46.67 | - | - | - | - |
| Bielik-1.5B-v1.0-DPO-001-L2Polish | - | - | 80.48 | - | - |
| Bielik-1.5B-v1.0-DPO-001-L3Polish | - | - | 16.38 | - | - |
| Bielik-1.5B-v1.0-DPO-001-L3-copyPolish | - | - | 74.70 | - | - |
| Bielik-1.5B-v1.0-m3Polish | - | - | 16.84 | - | - |
| Bielik-1.5B-v1.0-m3bPolish | - | - | 74 | - | - |
| Bielik-1.5B-v1.0-m4Polish | - | - | 74.50 | - | - |
| Bielik-1.5B-v3Polish | - | - | 35.78 | - | - |
| Bielik-1.5B-v3.0-InstructPolish | 27 | 2.38 | 41.36 | - | - |
| Bielik-1.5B-v3.0-Instruct-RC042025Polish | - | - | 20.50 | - | - |
| Bielik-1.5B-v3.0-Instruct-SFT-RC042025Polish | - | - | 71.11 | - | - |
| Bielik-11B-v2Polish | - | - | 34.37 | - | - |
| Bielik-11B-v2.0-InstructPolish | - | 2.20 | 50.18 | 7.56 | 68.24 |
| Bielik-11B-v2.1-InstructPolish | - | 3.92 | 83.64 | 9.50 | 60.07 |
| Bielik-11B-v2.2-InstructPolish | - | 3.72 | 83.80 | 9.35 | 69.05 |
| Bielik-11B-v2.2-M-1.2Polish | - | - | 92.32 | - | - |
| Bielik-11B-v2.3-InstructPolish | - | 3.97 | 86.29 | 8.97 | 70.86 |
| Bielik-11B-v2.3-Instruct-AWQPolish | - | - | 69.58 | - | - |
| Bielik-11B-v2.3-Instruct-GPTQPolish | - | - | 32.85 | - | - |
| Bielik-11B-v2.3-Instruct.IQ1_M.gguf.IQPolish | - | - | 26.85 | - | - |
| Bielik-11B-v2.3-Instruct.IQ2_XXS.gguf.IQPolish | - | - | 43.53 | - | - |
| Bielik-11B-v2.3-Instruct.IQ3_XXS.gguf.IQPolish | - | - | 84.35 | - | - |
| Bielik-11B-v2.3-Instruct.Q4_K_M.ggufPolish | - | - | 86.57 | - | - |
| Bielik-11B-v2.3-Instruct.Q4_K_M.gguf.IQPolish | - | - | 91.13 | - | - |
| Bielik-11B-v2.3-Instruct.Q6_K.ggufPolish | - | - | 86.57 | - | - |
| Bielik-11B-v2.3-Instruct.Q8_0.ggufPolish | - | - | 51.82 | - | - |
| Bielik-11B-v2.4-Instruct-MSPolish | - | - | 65.51 | - | - |
| Bielik-11B-v2.4-Instruct-SLPolish | - | - | 65.87 | - | - |
| Bielik-11B-v2.4-Instruct-TIPolish | - | - | 37.45 | - | - |
| Bielik-11B-v2.5-InstructPolish | - | 2.91 | 63.95 | - | 72.00 |
| Bielik-11B-v2.5-Instruct-D-GRPO_H_070Polish | - | - | 64.57 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_010Polish | - | - | 67.61 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_020Polish | - | - | 67.76 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_030Polish | - | - | 84.35 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_040Polish | - | - | 64.19 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_050Polish | - | - | 61.50 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_060Polish | - | - | 91.04 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_H_010Polish | - | - | 35.35 | - | - |
| Bielik-11B-v2.5-Instruct-GRPO_H_030Polish | - | - | 68.21 | - | - |
| Bielik-11B-v2.6-InstructPolish | - | 3.41 | 91.16 | - | 73.70 |
| Bielik-11B-v3-Base-20250730Polish | - | - | 77.56 | - | - |
| Bielik-11B-v3.0-InstructPolish | 78 | 3.73 | 69.48 | - | 71.20 |
| Bielik-11B-v3.0-Instruct-FP8-DynamicPolish | - | - | 80 | - | - |
| Bielik-11B-v3.0-Instruct.Q4_K_M.ggufPolish | - | - | 88.44 | - | - |
| Bielik-11B-v3.0-Instruct.Q6_K.ggufPolish | - | - | 88.44 | - | - |
| Bielik-11B-v3.0-Instruct.Q8_0.ggufPolish | - | - | 65.82 | - | - |
| Bielik-2.1Polish | 61 | - | - | - | - |
| Bielik-2.2Polish | 62 | - | - | - | - |
| Bielik-2.3Polish | 62.17 | - | - | - | - |
| Bielik-2.5Polish | 75 | - | - | - | - |
| Bielik-2.6Polish | 72 | - | - | - | - |
| Bielik-4.5B-v3Polish | - | - | 87.08 | - | - |
| Bielik-4.5B-v3.0-InstructPolish | 35 | 2.46 | 56.13 | - | 53.58 |
| Bielik-4.5B-v3.0-Instruct-SFT-RC042025Polish | - | - | 54.84 | - | - |
| Bielik-7B-Instruct-v0.1Polish | - | 2.16 | 30.43 | 6.15 | 31.26 |
| Bielik-7B-Instruct-v0.1-GPTQPolish | - | - | 66.44 | - | - |
| Bielik-7B-v0.1Polish | - | - | 20.85 | - | - |
| Bielik-Minitron-7B-v3.0-InstructPolish | 64 | - | - | - | - |
| Bielik-PL-11B-v3.0-InstructPolish | - | - | 66.24 | - | - |
| Bielik-PL-Minitron-7B-v3.0-InstructPolish | - | - | 81.99 | - | - |
| Bielik-SOLAR-LIKE-10.7B-Instruct-v0.1Polish | - | - | 31.87 | - | 34.17 |
| minitron-Bielik-7B-v3.0-Instruct-GGUF.Q4_K_M.ggufPolish | - | - | 42.09 | - | - |
| minitron-Bielik-7B-v3.0-Instruct-GGUF.Q6_K.ggufPolish | - | - | 87.33 | - | - |
| minitron-Bielik-7B-v3.0-Instruct-GGUF.Q8_0.ggufPolish | - | - | 44 | - | - |
| MSH-Lite-7B-v1-Bielik-v2.3-Instruct-Llama-PrunePolish | - | - | 19.53 | - | - |
| MSH-v1-Bielik-v2.3-Instruct-MedIT-mergePolish | - | - | 50.62 | - | - |
| speakleash/Bielik-Minitron-7B-v3.0-InstructPolish | - | 3.38 | - | - | - |
About Bielik
Bielik (Polish for "White-tailed Eagle") is developed by SpeakLeash. These models are specifically optimized for Polish language tasks with a custom APT4 tokenizer and trained on 292B+ tokens of Polish text. Apache 2.0 licensed.
PLLuM Cross-Benchmark Tracker
All PLLuM versions across 5 Polish benchmarks. Data live from the database.
| Model | PLCC | CPTU-Bench | Open PL LLM Leaderboard | Polish MT-Bench | Polish EQ-Bench |
|---|---|---|---|---|---|
| CYFRAGOVPL/Llama-PLLuM-8B-instructPolish | - | 3.46 | - | - | - |
| CYFRAGOVPL/PLLuM-12B-nc-chatPolish | - | 2.62 | - | - | - |
| CYFRAGOVPL/PLLuM-12B-nc-instructPolish | - | 3.31 | - | - | - |
| CYFRAGOVPL/pllum-12b-nc-instruct-250715Polish | - | 3.29 | - | - | - |
| Llama-PLLuM-70B-chatPolish | 46 | 3.61 | - | 4.80 | 72.56 |
| Llama-PLLuM-70B-chat-250801Polish | 62 | - | - | - | - |
| Llama-PLLuM-70B-instructPolish | - | 3.33 | - | - | 69.99 |
| Llama-PLLuM-8B-chatPolish | 34 | 2.25 | - | 6.05 | 46.20 |
| PLLuM-12B-chatPolish | 33 | 3.14 | - | 9.30 | 52.26 |
| PLLuM-12B-instructPolish | - | 3.09 | - | - | 36.21 |
| PLLuM-12B-nc-chatPolish | 70 | - | - | 7.55 | - |
| pllum-12b-nc-chat-250715Polish | - | 3.46 | - | - | 55.17 |
| PLLuM-12B-nc-chat-250715Polish | 75 | - | - | - | - |
| PLLuM-8x7B-chatPolish | 44 | 3.44 | - | 7.10 | 45.22 |
| PLLuM-8x7B-instructPolish | - | 3.01 | - | - | 39.55 |
| PLLuM-8x7B-nc-chatPolish | 73 | 3.08 | - | 3.35 | 47.29 |
| PLLuM-8x7B-nc-instructPolish | - | 3.22 | - | - | 41.75 |
About PLLuM
PLLuM (Polish Large Language Universal Model) is developed by OPI (National Information Processing Institute) as part of a government-backed initiative to build open Polish AI infrastructure. Models range from 8B to 70B parameters.
PLLuM Project Page →Resources & Links
Open PL LLM Leaderboard
Official leaderboard on HuggingFace Spaces
Bielik 11B v2 Technical Report
Full methodology and benchmark details
Bielik v3 Small Technical Report
APT4 tokenizer & efficiency innovations
SpeakLeash on HuggingFace
All Bielik model weights and documentation
PLCC Benchmark
Polish Linguistic and Cultural Competency
CPTU-Bench
Complex Polish Text Understanding — full benchmark breakdown on CodeSOTA
PLLuM Project
Polish Large Language Model by OPI
Explore More Benchmarks
See how Polish OCR models compare, or explore our broader LLM benchmark tracking.
Get notified when these results update
New models drop weekly. We track them so you don't have to.