Complex Polish Text Understanding Benchmark
Evaluates LLMs on understanding Polish text across 4 dimensions: sentiment analysis, language understanding (implicatures, author intent), phraseology (idioms, phraseological compounds), and tricky questions (logic, ambiguity, hallucination resistance). Score range 0-5 per category. 378 hand-written examples. Created by SpeakLeash/Spichlerz.
Qwen/Qwen3.5-27B thinking (API)
Qwen
4.336359
average
CPTU-Bench — average
93 results · 11 SOTA advances · higher is better
Model Size vs Score — Pareto Frontier
91 models · log scale · Pareto frontier shown
average Progress Over Time
Showing 14 breakthroughs from Nov 2023 to Jul 2025
Key Milestones
Top Models Performance Comparison
Top 10 models ranked by average
averagePrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Qwen/Qwen3.5-27B thinking (API)Open Source Qwen | 4.336359 | Jul 2025 | |
| 2 | gemini-2.0-flash-001Open Source Google | 4.291999 | Feb 2025 | |
| 3 | Qwen/Qwen3.5-27B non-thinking (API)Open Source Qwen | 4.27171 | Jul 2025 | |
| 4 | Qwen/Qwen3.5-35B-A3B thinking (API)Open Source Qwen | 4.223703 | Jul 2025 | |
| 5 | Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source Qwen | 4.175445 | Jul 2025 | |
| 6 | deepseek-ai/DeepSeek-V3.2 (API)Open Source deepseek-ai | 4.139189 | Jul 2025 | |
| 7 | deepseek-ai/DeepSeek-R1 (API)Open Source deepseek-ai | 4.137539 | Jan 2025 | |
| 8 | gemini-2.0-flash-lite-001Open Source Google | 4.093675 | Feb 2025 | |
| 9 | 🚧DeepSeek-V3-0324Open Source deepseek-ai | 4.029112 | Mar 2025 | |
| 10 | deepseek-ai/DeepSeek-V3.1 (API)Open Source deepseek-ai | 4.025966 | May 2025 | |
| 11 | deepseek-ai/DeepSeek-V3 (API)Open Source deepseek-ai | 4.023185 | Dec 2024 | |
| 12 | Mistral-Large-Instruct-2411Open Source mistralai | 4.004161 | Nov 2024 | |
| 13 | moonshotai/Kimi-K2-Instruct-0905 (API)Open Source moonshotai | 3.983402 | Sep 2025 | |
| 14 | Qwen2.5-72B-InstructOpen Source Qwen | 3.946478 | Sep 2024 | |
| 15 | Mistral-Large-Instruct-2407Open Source mistralai | 3.934209 | Jul 2024 | |
| 16 | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source meta-llama | 3.933613 | Apr 2025 | |
| 17 | Qwen/Qwen3-235B-A22B non-thinking (API)Open Source Qwen | 3.910936 | Apr 2025 | |
| 18 | mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source mistralai | 3.897741 | Mar 2025 | |
| 19 | mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source mistralai | 3.827445 | Jun 2025 | |
| 20 | openai/gpt-oss-120b (API)Open Source openai | 3.822487 | Jun 2025 | |
| 21 | gemma-3-27b-itOpen Source google | 3.805478 | Mar 2025 | |
| 22 | Meta-Llama-3-70B-InstructOpen Source meta-llama | 3.78187 | Apr 2024 | |
| 23 | Qwen2.5-32B-InstructOpen Source Qwen | 3.750998 | Sep 2024 | |
| 24 | Llama-4-Scout-17B-16E-InstructOpen Source meta-llama | 3.749644 | Apr 2025 | |
| 25 | Bielik-11B-v3.0-InstructOpen Source speakleash | 3.73465 | Jun 2025 | |
| 26 | Qwen/Qwen3-32B non-thinking (API)Open Source Qwen | 3.710353 | Apr 2025 | |
| 27 | Mistral-Small-24B-Instruct-2501Open Source mistralai | 3.708674 | Jan 2025 | |
| 28 | WizardLM-2-8x22BOpen Source alpindale | 3.699077 | Apr 2024 | |
| 29 | pllum-12b-nc-chat-250715Open Source CYFRAGOVPL | 3.666963 | Jul 2025 | |
| 30 | Qwen2-72B-InstructOpen Source Qwen | 3.653149 | Jun 2024 | |
| 31 | Llama-3.3-70B-InstructOpen Source meta-llama | 3.644069 | Dec 2024 | |
| 32 | Bielik-11B-v2.6-InstructOpen Source speakleash | 3.637017 | Feb 2025 | |
| 33 | Bielik-11B-v2.3-InstructOpen Source speakleash | 3.632115 | Nov 2024 | |
| 34 | Meta-Llama-3.1-70B-InstructOpen Source meta-llama | 3.62454 | Jul 2024 | |
| 35 | Bielik-11B-v2.1-InstructOpen Source speakleash | 3.61176 | Sep 2024 | |
| 36 | Mixtral-8x22B-Instruct-v0.1Open Source mistralai | 3.560752 | Apr 2024 | |
| 37 | Qwen2.5-14B-InstructOpen Source Qwen | 3.545584 | Sep 2024 | |
| 38 | Qwen/Qwen3-30B-A3B non-thinking (API)Open Source Qwen | 3.536973 | Apr 2025 | |
| 39 | Llama-PLLuM-70B-chatOpen Source CYFRAGOVPL | 3.528371 | Mar 2025 | |
| 40 | Qwen/Qwen3-14B non-thinking (API)Open Source Qwen | 3.511679 | Apr 2025 | |
| 41 | Bielik-11B-v2.5-InstructOpen Source speakleash | 3.476631 | Jan 2025 | |
| 42 | Bielik-11B-v2.2-InstructOpen Source speakleash | 3.455386 | Oct 2024 | |
| 43 | speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source speakleash | 3.378476 | Jul 2025 | |
| 44 | Bielik-4.5B-v3.0-InstructOpen Source speakleash | 3.375719 | Jun 2025 | |
| 45 | Llama-PLLuM-70B-instructOpen Source CYFRAGOVPL | 3.326208 | Mar 2025 | |
| 46 | CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source CYFRAGOVPL | 3.325261 | Jul 2025 | |
| 47 | phi-4Open Source microsoft | 3.304417 | Jan 2025 | |
| 48 | Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source Qwen | 3.275817 | Jul 2025 | |
| 49 | Bielik-11B-v2.0-InstructOpen Source speakleash | 3.260247 | Aug 2024 | |
| 50 | NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source nvidia | 3.247056 | Jun 2025 | |
| 51 | Qwen1.5-72B-ChatOpen Source Qwen | 3.158225 | Feb 2024 | |
| 52 | CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source CYFRAGOVPL | 3.153399 | Apr 2025 | |
| 53 | EuroLLM-9B-InstructOpen Source utter-project | 3.145644 | Mar 2025 | |
| 54 | PLLuM-12B-chatOpen Source CYFRAGOVPL | 3.137472 | Apr 2025 | |
| 55 | PLLuM-8x7B-nc-instructOpen Source CYFRAGOVPL | 3.113511 | Feb 2025 | |
| 56 | PLLuM-12B-instructOpen Source CYFRAGOVPL | 3.093624 | Apr 2025 | |
| 57 | Qwen2.5-7B-InstructOpen Source Qwen | 3.06549 | Sep 2024 | |
| 58 | Qwen/Qwen3-8B non-thinking (API)Open Source Qwen | 3.061909 | Apr 2025 | |
| 59 | PLLuM-8x7B-nc-chatOpen Source CYFRAGOVPL | 3.029438 | Feb 2025 | |
| 60 | Meta-Llama-3.1-8B-InstructOpen Source meta-llama | 3.01168 | Jul 2024 | |
| 61 | PLLuM-8x7B-instructOpen Source CYFRAGOVPL | 3.006404 | Feb 2025 | |
| 62 | PLLuM-8x7B-chatOpen Source CYFRAGOVPL | 3.005225 | Feb 2025 | |
| 63 | Meta-Llama-3-8B-InstructOpen Source meta-llama | 2.998965 | Apr 2024 | |
| 64 | CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source CYFRAGOVPL | 2.963287 | Apr 2025 | |
| 65 | glm-4-9b-chatOpen Source THUDM | 2.951972 | Jun 2024 | |
| 66 | Mistral-Nemo-Instruct-2407Open Source mistralai | 2.940228 | Jul 2024 | |
| 67 | Llama-PLLuM-8B-chatOpen Source CYFRAGOVPL | 2.918202 | Mar 2025 | |
| 68 | Bielik-7B-Instruct-v0.1Open Source speakleash | 2.884262 | Apr 2024 | |
| 69 | SOLAR-10.7B-Instruct-v1.0Open Source upstage | 2.881636 | Dec 2023 | |
| 70 | CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source CYFRAGOVPL | 2.81573 | Mar 2025 | |
| 71 | Mistral-7B-Instruct-v0.3Open Source mistralai | 2.763922 | May 2024 | |
| 72 | openchat-3.5-0106-gemmaOpen Source openchat | 2.733886 | Dec 2023 | |
| 73 | Mixtral-8x7B-Instruct-v0.1Open Source mistralai | 2.728861 | Dec 2023 | |
| 74 | gemma-2-2b-itOpen Source google | 2.65148 | Jun 2024 | |
| 75 | Starling-LM-7B-alphaOpen Source berkeley-nest | 2.629367 | Nov 2023 | |
| 76 | openchat-3.5-0106Open Source openchat | 2.627733 | Dec 2023 | |
| 77 | Qwen2.5-3B-InstructOpen Source Qwen | 2.503177 | Sep 2024 | |
| 78 | Bielik-1.5B-v3.0-InstructOpen Source speakleash | 2.363686 | Jun 2025 | |
| 79 | Yi-1.5-34B-ChatOpen Source 01-ai | 2.331731 | May 2024 | |
| 80 | trurl-2-13b-academicOpen Source Voicelab | 2.309534 | Jan 2024 | |
| 81 | NousResearch/Hermes-3-Llama-3.2-3BOpen Source NousResearch | 2.306459 | Oct 2024 | |
| 82 | Phi-4-mini-instructOpen Source microsoft | 2.16767 | Apr 2025 | |
| 83 | internlm2-chat-20bOpen Source internlm | 2.148719 | Jan 2024 | |
| 84 | Phi-3.5-mini-instructOpen Source microsoft | 2.01021 | Aug 2024 | |
| 85 | Llama-3.2-3B-InstructOpen Source meta-llama | 1.997628 | Sep 2024 | |
| 86 | granite-3.1-2b-instructOpen Source ibm-granite | 1.945453 | Jan 2025 | |
| 87 | Llama-3.2-1B-InstructOpen Source meta-llama | 1.918599 | Sep 2024 | |
| 88 | EuroLLM-1.7B-InstructOpen Source utter-project | 1.763004 | Jan 2025 | |
| 89 | Qwen2.5-1.5B-InstructOpen Source Qwen | 1.758198 | Sep 2024 | |
| 90 | LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source LGAI-EXAONE | 1.669326 | Jan 2025 | |
| 91 | h2oai/h2o-danube2-1.8b-chatOpen Source h2oai | 1.641502 | Apr 2024 | |
| 92 | SmolLM2-1.7B-InstructOpen Source HuggingFaceTB | 1.495863 | Feb 2025 | |
| 93 | Qwen/Qwen2.5-0.5B-InstructOpen Source Qwen | 1.401057 | Sep 2024 |
language-understanding
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | deepseek-ai/DeepSeek-V3.2 (API)Open Source deepseek-ai | 4.36 | Jul 2025 | |
| 2 | deepseek-ai/DeepSeek-R1 (API)Open Source deepseek-ai | 4.345 | Jan 2025 | |
| 3 | deepseek-ai/DeepSeek-V3.1 (API)Open Source deepseek-ai | 4.335 | May 2025 | |
| 4 | gemini-2.0-flash-001Open Source Google | 4.32 | Feb 2025 | |
| 5 | deepseek-ai/DeepSeek-V3 (API)Open Source deepseek-ai | 4.22 | Dec 2024 | |
| 6 | Qwen/Qwen3.5-27B thinking (API)Open Source Qwen | 4.205 | Jul 2025 | |
| 7 | 🚧DeepSeek-V3-0324Open Source deepseek-ai | 4.195 | Mar 2025 | |
| 8 | moonshotai/Kimi-K2-Instruct-0905 (API)Open Source moonshotai | 4.18 | Sep 2025 | |
| 9 | Qwen/Qwen3.5-27B non-thinking (API)Open Source Qwen | 4.17 | Jul 2025 | |
| 10 | Qwen/Qwen3-235B-A22B non-thinking (API)Open Source Qwen | 4.155 | Apr 2025 | |
| 11 | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source meta-llama | 4.11 | Apr 2025 | |
| 12 | gemini-2.0-flash-lite-001Open Source Google | 4.055 | Feb 2025 | |
| 13 | Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source Qwen | 4.05 | Jul 2025 | |
| 14 | mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source mistralai | 4.005 | Jun 2025 | |
| 15 | Mistral-Large-Instruct-2407Open Source mistralai | 4 | Jul 2024 | |
| 16 | Mistral-Large-Instruct-2411Open Source mistralai | 3.975 | Nov 2024 | |
| 17 | openai/gpt-oss-120b (API)Open Source openai | 3.97 | Jun 2025 | |
| 18 | Qwen2.5-72B-InstructOpen Source Qwen | 3.97 | Sep 2024 | |
| 19 | pllum-12b-nc-chat-250715Open Source CYFRAGOVPL | 3.955 | Jul 2025 | |
| 20 | Bielik-11B-v2.6-InstructOpen Source speakleash | 3.94 | Feb 2025 | |
| 21 | Qwen/Qwen3.5-35B-A3B thinking (API)Open Source Qwen | 3.94 | Jul 2025 | |
| 22 | Bielik-11B-v2.1-InstructOpen Source speakleash | 3.915 | Sep 2024 | |
| 23 | Qwen/Qwen3-32B non-thinking (API)Open Source Qwen | 3.91 | Apr 2025 | |
| 24 | Bielik-11B-v3.0-InstructOpen Source speakleash | 3.91 | Jun 2025 | |
| 25 | Meta-Llama-3.1-70B-InstructOpen Source meta-llama | 3.91 | Jul 2024 | |
| 26 | Qwen2-72B-InstructOpen Source Qwen | 3.89 | Jun 2024 | |
| 27 | mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source mistralai | 3.885 | Mar 2025 | |
| 28 | Llama-3.3-70B-InstructOpen Source meta-llama | 3.865 | Dec 2024 | |
| 29 | Bielik-11B-v2.5-InstructOpen Source speakleash | 3.86 | Jan 2025 | |
| 30 | speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source speakleash | 3.83 | Jul 2025 | |
| 31 | Meta-Llama-3-70B-InstructOpen Source meta-llama | 3.82 | Apr 2024 | |
| 32 | WizardLM-2-8x22BOpen Source alpindale | 3.815 | Apr 2024 | |
| 33 | Llama-4-Scout-17B-16E-InstructOpen Source meta-llama | 3.805 | Apr 2025 | |
| 34 | Bielik-11B-v2.3-InstructOpen Source speakleash | 3.785 | Nov 2024 | |
| 35 | gemma-3-27b-itOpen Source google | 3.785 | Mar 2025 | |
| 36 | Bielik-11B-v2.0-InstructOpen Source speakleash | 3.745 | Aug 2024 | |
| 37 | Bielik-11B-v2.2-InstructOpen Source speakleash | 3.73 | Oct 2024 | |
| 38 | CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source CYFRAGOVPL | 3.725 | Jul 2025 | |
| 39 | Mixtral-8x22B-Instruct-v0.1Open Source mistralai | 3.675 | Apr 2024 | |
| 40 | Llama-PLLuM-70B-instructOpen Source CYFRAGOVPL | 3.63 | Mar 2025 | |
| 41 | Llama-PLLuM-70B-chatOpen Source CYFRAGOVPL | 3.61 | Mar 2025 | |
| 42 | Bielik-4.5B-v3.0-InstructOpen Source speakleash | 3.61 | Jun 2025 | |
| 43 | Mistral-Small-24B-Instruct-2501Open Source mistralai | 3.6 | Jan 2025 | |
| 44 | PLLuM-8x7B-nc-instructOpen Source CYFRAGOVPL | 3.59 | Feb 2025 | |
| 45 | Qwen2.5-32B-InstructOpen Source Qwen | 3.565 | Sep 2024 | |
| 46 | Qwen2.5-14B-InstructOpen Source Qwen | 3.565 | Sep 2024 | |
| 47 | Qwen/Qwen3-14B non-thinking (API)Open Source Qwen | 3.56 | Apr 2025 | |
| 48 | phi-4Open Source microsoft | 3.54 | Jan 2025 | |
| 49 | Qwen1.5-72B-ChatOpen Source Qwen | 3.515 | Feb 2024 | |
| 50 | PLLuM-8x7B-nc-chatOpen Source CYFRAGOVPL | 3.48 | Feb 2025 | |
| 51 | Bielik-7B-Instruct-v0.1Open Source speakleash | 3.475 | Apr 2024 | |
| 52 | PLLuM-8x7B-instructOpen Source CYFRAGOVPL | 3.47 | Feb 2025 | |
| 53 | glm-4-9b-chatOpen Source THUDM | 3.455 | Jun 2024 | |
| 54 | PLLuM-8x7B-chatOpen Source CYFRAGOVPL | 3.45 | Feb 2025 | |
| 55 | Qwen/Qwen3-30B-A3B non-thinking (API)Open Source Qwen | 3.39 | Apr 2025 | |
| 56 | Meta-Llama-3.1-8B-InstructOpen Source meta-llama | 3.38 | Jul 2024 | |
| 57 | CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source CYFRAGOVPL | 3.31 | Apr 2025 | |
| 58 | EuroLLM-9B-InstructOpen Source utter-project | 3.3 | Mar 2025 | |
| 59 | Mistral-Nemo-Instruct-2407Open Source mistralai | 3.29 | Jul 2024 | |
| 60 | NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source nvidia | 3.27 | Jun 2025 | |
| 61 | CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source CYFRAGOVPL | 3.23 | Apr 2025 | |
| 62 | Qwen/Qwen3-8B non-thinking (API)Open Source Qwen | 3.225 | Apr 2025 | |
| 63 | PLLuM-12B-chatOpen Source CYFRAGOVPL | 3.21 | Apr 2025 | |
| 64 | SOLAR-10.7B-Instruct-v1.0Open Source upstage | 3.18 | Dec 2023 | |
| 65 | Mixtral-8x7B-Instruct-v0.1Open Source mistralai | 3.175 | Dec 2023 | |
| 66 | PLLuM-12B-instructOpen Source CYFRAGOVPL | 3.17 | Apr 2025 | |
| 67 | Meta-Llama-3-8B-InstructOpen Source meta-llama | 3.15 | Apr 2024 | |
| 68 | openchat-3.5-0106-gemmaOpen Source openchat | 3.08 | Dec 2023 | |
| 69 | Mistral-7B-Instruct-v0.3Open Source mistralai | 3.06 | May 2024 | |
| 70 | Qwen2.5-7B-InstructOpen Source Qwen | 3.025 | Sep 2024 | |
| 71 | Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source Qwen | 2.975 | Jul 2025 | |
| 72 | Llama-PLLuM-8B-chatOpen Source CYFRAGOVPL | 2.93 | Mar 2025 | |
| 73 | Starling-LM-7B-alphaOpen Source berkeley-nest | 2.925 | Nov 2023 | |
| 74 | gemma-2-2b-itOpen Source google | 2.9 | Jun 2024 | |
| 75 | CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source CYFRAGOVPL | 2.9 | Mar 2025 | |
| 76 | Yi-1.5-34B-ChatOpen Source 01-ai | 2.87 | May 2024 | |
| 77 | openchat-3.5-0106Open Source openchat | 2.835 | Dec 2023 | |
| 78 | internlm2-chat-20bOpen Source internlm | 2.785 | Jan 2024 | |
| 79 | trurl-2-13b-academicOpen Source Voicelab | 2.755 | Jan 2024 | |
| 80 | NousResearch/Hermes-3-Llama-3.2-3BOpen Source NousResearch | 2.705 | Oct 2024 | |
| 81 | Qwen2.5-3B-InstructOpen Source Qwen | 2.455 | Sep 2024 | |
| 82 | Phi-4-mini-instructOpen Source microsoft | 2.43 | Apr 2025 | |
| 83 | Bielik-1.5B-v3.0-InstructOpen Source speakleash | 2.33 | Jun 2025 | |
| 84 | Llama-3.2-3B-InstructOpen Source meta-llama | 2.295 | Sep 2024 | |
| 85 | granite-3.1-2b-instructOpen Source ibm-granite | 2.235 | Jan 2025 | |
| 86 | Phi-3.5-mini-instructOpen Source microsoft | 2.135 | Aug 2024 | |
| 87 | LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source LGAI-EXAONE | 2.115578 | Jan 2025 | |
| 88 | EuroLLM-1.7B-InstructOpen Source utter-project | 1.79 | Jan 2025 | |
| 89 | Llama-3.2-1B-InstructOpen Source meta-llama | 1.735 | Sep 2024 | |
| 90 | h2oai/h2o-danube2-1.8b-chatOpen Source h2oai | 1.595 | Apr 2024 | |
| 91 | Qwen2.5-1.5B-InstructOpen Source Qwen | 1.35 | Sep 2024 | |
| 92 | SmolLM2-1.7B-InstructOpen Source HuggingFaceTB | 1.1 | Feb 2025 | |
| 93 | Qwen/Qwen2.5-0.5B-InstructOpen Source Qwen | 0.835 | Sep 2024 |
phraseology
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemini-2.0-flash-001Open Source Google | 4.34 | Feb 2025 | |
| 2 | gemini-2.0-flash-lite-001Open Source Google | 4.235 | Feb 2025 | |
| 3 | Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source Qwen | 4.23 | Jul 2025 | |
| 4 | WizardLM-2-8x22BOpen Source alpindale | 4.22 | Apr 2024 | |
| 5 | Qwen/Qwen3.5-27B non-thinking (API)Open Source Qwen | 4.195 | Jul 2025 | |
| 6 | Qwen/Qwen3.5-35B-A3B thinking (API)Open Source Qwen | 4.15 | Jul 2025 | |
| 7 | mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source mistralai | 4.15 | Mar 2025 | |
| 8 | Qwen/Qwen3.5-27B thinking (API)Open Source Qwen | 4.105 | Jul 2025 | |
| 9 | Qwen2.5-32B-InstructOpen Source Qwen | 4.035 | Sep 2024 | |
| 10 | gemma-3-27b-itOpen Source google | 4.025 | Mar 2025 | |
| 11 | mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source mistralai | 3.995 | Jun 2025 | |
| 12 | Mistral-Large-Instruct-2411Open Source mistralai | 3.99 | Nov 2024 | |
| 13 | Bielik-11B-v3.0-InstructOpen Source speakleash | 3.965 | Jun 2025 | |
| 14 | Qwen2.5-72B-InstructOpen Source Qwen | 3.93 | Sep 2024 | |
| 15 | Llama-4-Scout-17B-16E-InstructOpen Source meta-llama | 3.9 | Apr 2025 | |
| 16 | Mistral-Small-24B-Instruct-2501Open Source mistralai | 3.875 | Jan 2025 | |
| 17 | Mistral-Large-Instruct-2407Open Source mistralai | 3.86 | Jul 2024 | |
| 18 | Bielik-4.5B-v3.0-InstructOpen Source speakleash | 3.675 | Jun 2025 | |
| 19 | deepseek-ai/DeepSeek-R1 (API)Open Source deepseek-ai | 3.6 | Jan 2025 | |
| 20 | PLLuM-12B-instructOpen Source CYFRAGOVPL | 3.59 | Apr 2025 | |
| 21 | Mixtral-8x22B-Instruct-v0.1Open Source mistralai | 3.55 | Apr 2024 | |
| 22 | Bielik-11B-v2.3-InstructOpen Source speakleash | 3.55 | Nov 2024 | |
| 23 | deepseek-ai/DeepSeek-V3.2 (API)Open Source deepseek-ai | 3.545 | Jul 2025 | |
| 24 | 🚧DeepSeek-V3-0324Open Source deepseek-ai | 3.54 | Mar 2025 | |
| 25 | CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source CYFRAGOVPL | 3.54 | Apr 2025 | |
| 26 | deepseek-ai/DeepSeek-V3 (API)Open Source deepseek-ai | 3.525 | Dec 2024 | |
| 27 | Qwen/Qwen3-30B-A3B non-thinking (API)Open Source Qwen | 3.495 | Apr 2025 | |
| 28 | openai/gpt-oss-120b (API)Open Source openai | 3.49 | Jun 2025 | |
| 29 | Qwen/Qwen3-235B-A22B non-thinking (API)Open Source Qwen | 3.485 | Apr 2025 | |
| 30 | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source meta-llama | 3.475 | Apr 2025 | |
| 31 | deepseek-ai/DeepSeek-V3.1 (API)Open Source deepseek-ai | 3.475 | May 2025 | |
| 32 | Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source Qwen | 3.475 | Jul 2025 | |
| 33 | Meta-Llama-3-70B-InstructOpen Source meta-llama | 3.465 | Apr 2024 | |
| 34 | CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source CYFRAGOVPL | 3.46 | Mar 2025 | |
| 35 | PLLuM-8x7B-instructOpen Source CYFRAGOVPL | 3.46 | Feb 2025 | |
| 36 | pllum-12b-nc-chat-250715Open Source CYFRAGOVPL | 3.455 | Jul 2025 | |
| 37 | moonshotai/Kimi-K2-Instruct-0905 (API)Open Source moonshotai | 3.43 | Sep 2025 | |
| 38 | PLLuM-12B-chatOpen Source CYFRAGOVPL | 3.43 | Apr 2025 | |
| 39 | Bielik-11B-v2.6-InstructOpen Source speakleash | 3.41 | Feb 2025 | |
| 40 | Qwen2.5-14B-InstructOpen Source Qwen | 3.37 | Sep 2024 | |
| 41 | Llama-PLLuM-8B-chatOpen Source CYFRAGOVPL | 3.36 | Mar 2025 | |
| 42 | Llama-PLLuM-70B-chatOpen Source CYFRAGOVPL | 3.35 | Mar 2025 | |
| 43 | PLLuM-8x7B-chatOpen Source CYFRAGOVPL | 3.35 | Feb 2025 | |
| 44 | CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source CYFRAGOVPL | 3.32 | Apr 2025 | |
| 45 | CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source CYFRAGOVPL | 3.295 | Jul 2025 | |
| 46 | Qwen2-72B-InstructOpen Source Qwen | 3.28 | Jun 2024 | |
| 47 | Llama-PLLuM-70B-instructOpen Source CYFRAGOVPL | 3.26 | Mar 2025 | |
| 48 | SOLAR-10.7B-Instruct-v1.0Open Source upstage | 3.255 | Dec 2023 | |
| 49 | Bielik-11B-v2.2-InstructOpen Source speakleash | 3.25 | Oct 2024 | |
| 50 | Meta-Llama-3.1-70B-InstructOpen Source meta-llama | 3.25 | Jul 2024 | |
| 51 | Qwen/Qwen3-14B non-thinking (API)Open Source Qwen | 3.245 | Apr 2025 | |
| 52 | phi-4Open Source microsoft | 3.235 | Jan 2025 | |
| 53 | Qwen/Qwen3-32B non-thinking (API)Open Source Qwen | 3.235 | Apr 2025 | |
| 54 | speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source speakleash | 3.23 | Jul 2025 | |
| 55 | PLLuM-8x7B-nc-instructOpen Source CYFRAGOVPL | 3.22 | Feb 2025 | |
| 56 | EuroLLM-9B-InstructOpen Source utter-project | 3.17 | Mar 2025 | |
| 57 | Bielik-11B-v2.5-InstructOpen Source speakleash | 3.13 | Jan 2025 | |
| 58 | Bielik-11B-v2.0-InstructOpen Source speakleash | 3.125 | Aug 2024 | |
| 59 | Bielik-11B-v2.1-InstructOpen Source speakleash | 3.105 | Sep 2024 | |
| 60 | Qwen2.5-7B-InstructOpen Source Qwen | 3.095 | Sep 2024 | |
| 61 | PLLuM-8x7B-nc-chatOpen Source CYFRAGOVPL | 3.08 | Feb 2025 | |
| 62 | Llama-3.3-70B-InstructOpen Source meta-llama | 3.04 | Dec 2024 | |
| 63 | Meta-Llama-3-8B-InstructOpen Source meta-llama | 3.035 | Apr 2024 | |
| 64 | Qwen1.5-72B-ChatOpen Source Qwen | 2.975 | Feb 2024 | |
| 65 | Mixtral-8x7B-Instruct-v0.1Open Source mistralai | 2.885 | Dec 2023 | |
| 66 | Starling-LM-7B-alphaOpen Source berkeley-nest | 2.855 | Nov 2023 | |
| 67 | Qwen2.5-3B-InstructOpen Source Qwen | 2.8 | Sep 2024 | |
| 68 | glm-4-9b-chatOpen Source THUDM | 2.78 | Jun 2024 | |
| 69 | Qwen/Qwen3-8B non-thinking (API)Open Source Qwen | 2.765 | Apr 2025 | |
| 70 | NousResearch/Hermes-3-Llama-3.2-3BOpen Source NousResearch | 2.765 | Oct 2024 | |
| 71 | NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source nvidia | 2.76 | Jun 2025 | |
| 72 | Mistral-Nemo-Instruct-2407Open Source mistralai | 2.74 | Jul 2024 | |
| 73 | Mistral-7B-Instruct-v0.3Open Source mistralai | 2.68 | May 2024 | |
| 74 | Qwen/Qwen2.5-0.5B-InstructOpen Source Qwen | 2.595 | Sep 2024 | |
| 75 | Meta-Llama-3.1-8B-InstructOpen Source meta-llama | 2.58 | Jul 2024 | |
| 76 | openchat-3.5-0106Open Source openchat | 2.555 | Dec 2023 | |
| 77 | h2oai/h2o-danube2-1.8b-chatOpen Source h2oai | 2.47 | Apr 2024 | |
| 78 | openchat-3.5-0106-gemmaOpen Source openchat | 2.445 | Dec 2023 | |
| 79 | Phi-3.5-mini-instructOpen Source microsoft | 2.425 | Aug 2024 | |
| 80 | internlm2-chat-20bOpen Source internlm | 2.385 | Jan 2024 | |
| 81 | Bielik-1.5B-v3.0-InstructOpen Source speakleash | 2.38 | Jun 2025 | |
| 82 | Yi-1.5-34B-ChatOpen Source 01-ai | 2.38 | May 2024 | |
| 83 | SmolLM2-1.7B-InstructOpen Source HuggingFaceTB | 2.355 | Feb 2025 | |
| 84 | Llama-3.2-1B-InstructOpen Source meta-llama | 2.34 | Sep 2024 | |
| 85 | Bielik-7B-Instruct-v0.1Open Source speakleash | 2.315 | Apr 2024 | |
| 86 | EuroLLM-1.7B-InstructOpen Source utter-project | 2.26 | Jan 2025 | |
| 87 | Phi-4-mini-instructOpen Source microsoft | 2.245 | Apr 2025 | |
| 88 | Qwen2.5-1.5B-InstructOpen Source Qwen | 2.225 | Sep 2024 | |
| 89 | trurl-2-13b-academicOpen Source Voicelab | 2.165 | Jan 2024 | |
| 90 | LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source LGAI-EXAONE | 2.130653 | Jan 2025 | |
| 91 | gemma-2-2b-itOpen Source google | 2.095 | Jun 2024 | |
| 92 | granite-3.1-2b-instructOpen Source ibm-granite | 1.88 | Jan 2025 | |
| 93 | Llama-3.2-3B-InstructOpen Source meta-llama | 1.72 | Sep 2024 |
sentiment
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | gemini-2.0-flash-001Open Source Google | 4.519231 | Feb 2025 | |
| 2 | deepseek-ai/DeepSeek-R1 (API)Open Source deepseek-ai | 4.487179 | Jan 2025 | |
| 3 | deepseek-ai/DeepSeek-V3.2 (API)Open Source deepseek-ai | 4.455128 | Jul 2025 | |
| 4 | deepseek-ai/DeepSeek-V3.1 (API)Open Source deepseek-ai | 4.423077 | May 2025 | |
| 5 | Qwen/Qwen3.5-27B thinking (API)Open Source Qwen | 4.423077 | Jul 2025 | |
| 6 | moonshotai/Kimi-K2-Instruct-0905 (API)Open Source moonshotai | 4.391026 | Sep 2025 | |
| 7 | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source meta-llama | 4.391026 | Apr 2025 | |
| 8 | deepseek-ai/DeepSeek-V3 (API)Open Source deepseek-ai | 4.358974 | Dec 2024 | |
| 9 | 🚧DeepSeek-V3-0324Open Source deepseek-ai | 4.358974 | Mar 2025 | |
| 10 | pllum-12b-nc-chat-250715Open Source CYFRAGOVPL | 4.358974 | Jul 2025 | |
| 11 | Mistral-Large-Instruct-2411Open Source mistralai | 4.326923 | Nov 2024 | |
| 12 | Meta-Llama-3.1-70B-InstructOpen Source meta-llama | 4.326923 | Jul 2024 | |
| 13 | Llama-3.3-70B-InstructOpen Source meta-llama | 4.294872 | Dec 2024 | |
| 14 | Qwen/Qwen3.5-27B non-thinking (API)Open Source Qwen | 4.294872 | Jul 2025 | |
| 15 | Mistral-Large-Instruct-2407Open Source mistralai | 4.230769 | Jul 2024 | |
| 16 | gemini-2.0-flash-lite-001Open Source Google | 4.230769 | Feb 2025 | |
| 17 | Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source Qwen | 4.230769 | Jul 2025 | |
| 18 | Qwen/Qwen3-235B-A22B non-thinking (API)Open Source Qwen | 4.166667 | Apr 2025 | |
| 19 | Meta-Llama-3-70B-InstructOpen Source meta-llama | 4.134615 | Apr 2024 | |
| 20 | Qwen/Qwen3-32B non-thinking (API)Open Source Qwen | 4.134615 | Apr 2025 | |
| 21 | mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source mistralai | 4.134615 | Mar 2025 | |
| 22 | Llama-4-Scout-17B-16E-InstructOpen Source meta-llama | 4.102564 | Apr 2025 | |
| 23 | Qwen/Qwen3.5-35B-A3B thinking (API)Open Source Qwen | 4.102564 | Jul 2025 | |
| 24 | Bielik-11B-v2.6-InstructOpen Source speakleash | 4.102564 | Feb 2025 | |
| 25 | Qwen2.5-72B-InstructOpen Source Qwen | 4.076923 | Sep 2024 | |
| 26 | Bielik-11B-v2.5-InstructOpen Source speakleash | 4.00641 | Jan 2025 | |
| 27 | mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source mistralai | 4.00641 | Jun 2025 | |
| 28 | Meta-Llama-3.1-8B-InstructOpen Source meta-llama | 3.974359 | Jul 2024 | |
| 29 | Bielik-11B-v2.0-InstructOpen Source speakleash | 3.974359 | Aug 2024 | |
| 30 | Bielik-11B-v2.3-InstructOpen Source speakleash | 3.974359 | Nov 2024 | |
| 31 | Bielik-11B-v2.1-InstructOpen Source speakleash | 3.955128 | Sep 2024 | |
| 32 | openai/gpt-oss-120b (API)Open Source openai | 3.942308 | Jun 2025 | |
| 33 | Llama-PLLuM-70B-chatOpen Source CYFRAGOVPL | 3.94 | Mar 2025 | |
| 34 | Qwen/Qwen3-14B non-thinking (API)Open Source Qwen | 3.910256 | Apr 2025 | |
| 35 | Qwen2.5-14B-InstructOpen Source Qwen | 3.910256 | Sep 2024 | |
| 36 | Mistral-Small-24B-Instruct-2501Open Source mistralai | 3.910256 | Jan 2025 | |
| 37 | CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source CYFRAGOVPL | 3.910256 | Jul 2025 | |
| 38 | PLLuM-8x7B-nc-instructOpen Source CYFRAGOVPL | 3.88 | Feb 2025 | |
| 39 | Bielik-11B-v3.0-InstructOpen Source speakleash | 3.878205 | Jun 2025 | |
| 40 | gemma-3-27b-itOpen Source google | 3.878205 | Mar 2025 | |
| 41 | Qwen2.5-32B-InstructOpen Source Qwen | 3.814103 | Sep 2024 | |
| 42 | Mixtral-8x22B-Instruct-v0.1Open Source mistralai | 3.782051 | Apr 2024 | |
| 43 | Llama-PLLuM-70B-instructOpen Source CYFRAGOVPL | 3.78 | Mar 2025 | |
| 44 | Bielik-4.5B-v3.0-InstructOpen Source speakleash | 3.762821 | Jun 2025 | |
| 45 | Qwen2-72B-InstructOpen Source Qwen | 3.762821 | Jun 2024 | |
| 46 | PLLuM-8x7B-nc-chatOpen Source CYFRAGOVPL | 3.76 | Feb 2025 | |
| 47 | openchat-3.5-0106-gemmaOpen Source openchat | 3.730769 | Dec 2023 | |
| 48 | Qwen/Qwen3-30B-A3B non-thinking (API)Open Source Qwen | 3.717949 | Apr 2025 | |
| 49 | speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source speakleash | 3.717949 | Jul 2025 | |
| 50 | phi-4Open Source microsoft | 3.717949 | Jan 2025 | |
| 51 | Bielik-11B-v2.2-InstructOpen Source speakleash | 3.717949 | Oct 2024 | |
| 52 | PLLuM-12B-instructOpen Source CYFRAGOVPL | 3.71 | Apr 2025 | |
| 53 | WizardLM-2-8x22BOpen Source alpindale | 3.705128 | Apr 2024 | |
| 54 | Mistral-Nemo-Instruct-2407Open Source mistralai | 3.641026 | Jul 2024 | |
| 55 | PLLuM-8x7B-instructOpen Source CYFRAGOVPL | 3.59 | Feb 2025 | |
| 56 | glm-4-9b-chatOpen Source THUDM | 3.589744 | Jun 2024 | |
| 57 | Bielik-7B-Instruct-v0.1Open Source speakleash | 3.589744 | Apr 2024 | |
| 58 | Qwen2.5-7B-InstructOpen Source Qwen | 3.557692 | Sep 2024 | |
| 59 | NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source nvidia | 3.525641 | Jun 2025 | |
| 60 | Bielik-1.5B-v3.0-InstructOpen Source speakleash | 3.525641 | Jun 2025 | |
| 61 | Qwen/Qwen3-8B non-thinking (API)Open Source Qwen | 3.49359 | Apr 2025 | |
| 62 | Qwen1.5-72B-ChatOpen Source Qwen | 3.474359 | Feb 2024 | |
| 63 | PLLuM-8x7B-chatOpen Source CYFRAGOVPL | 3.44 | Feb 2025 | |
| 64 | gemma-2-2b-itOpen Source google | 3.397436 | Jun 2024 | |
| 65 | EuroLLM-9B-InstructOpen Source utter-project | 3.365385 | Mar 2025 | |
| 66 | Meta-Llama-3-8B-InstructOpen Source meta-llama | 3.333333 | Apr 2024 | |
| 67 | Mistral-7B-Instruct-v0.3Open Source mistralai | 3.326923 | May 2024 | |
| 68 | PLLuM-12B-chatOpen Source CYFRAGOVPL | 3.32 | Apr 2025 | |
| 69 | internlm2-chat-20bOpen Source internlm | 3.301282 | Jan 2024 | |
| 70 | trurl-2-13b-academicOpen Source Voicelab | 3.301282 | Jan 2024 | |
| 71 | CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source CYFRAGOVPL | 3.24 | Mar 2025 | |
| 72 | CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source CYFRAGOVPL | 3.24 | Apr 2025 | |
| 73 | CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source CYFRAGOVPL | 3.22 | Apr 2025 | |
| 74 | openchat-3.5-0106Open Source openchat | 3.160256 | Dec 2023 | |
| 75 | Llama-PLLuM-8B-chatOpen Source CYFRAGOVPL | 3.13 | Mar 2025 | |
| 76 | Llama-3.2-1B-InstructOpen Source meta-llama | 3.076923 | Sep 2024 | |
| 77 | Yi-1.5-34B-ChatOpen Source 01-ai | 3.076923 | May 2024 | |
| 78 | granite-3.1-2b-instructOpen Source ibm-granite | 3.076923 | Jan 2025 | |
| 79 | Starling-LM-7B-alphaOpen Source berkeley-nest | 3.057692 | Nov 2023 | |
| 80 | Mixtral-8x7B-Instruct-v0.1Open Source mistralai | 3.057692 | Dec 2023 | |
| 81 | Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source Qwen | 3.012821 | Jul 2025 | |
| 82 | SOLAR-10.7B-Instruct-v1.0Open Source upstage | 2.967949 | Dec 2023 | |
| 83 | Qwen2.5-3B-InstructOpen Source Qwen | 2.948718 | Sep 2024 | |
| 84 | Qwen2.5-1.5B-InstructOpen Source Qwen | 2.794872 | Sep 2024 | |
| 85 | Llama-3.2-3B-InstructOpen Source meta-llama | 2.75641 | Sep 2024 | |
| 86 | Phi-4-mini-instructOpen Source microsoft | 2.692308 | Apr 2025 | |
| 87 | NousResearch/Hermes-3-Llama-3.2-3BOpen Source NousResearch | 2.615385 | Oct 2024 | |
| 88 | Phi-3.5-mini-instructOpen Source microsoft | 2.435897 | Aug 2024 | |
| 89 | h2oai/h2o-danube2-1.8b-chatOpen Source h2oai | 2.371795 | Apr 2024 | |
| 90 | SmolLM2-1.7B-InstructOpen Source HuggingFaceTB | 2.275641 | Feb 2025 | |
| 91 | EuroLLM-1.7B-InstructOpen Source utter-project | 2.24359 | Jan 2025 | |
| 92 | Qwen/Qwen2.5-0.5B-InstructOpen Source Qwen | 1.955128 | Sep 2024 | |
| 93 | LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source LGAI-EXAONE | 1.942308 | Jan 2025 |
tricky-questions
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Qwen/Qwen3.5-35B-A3B thinking (API)Open Source Qwen | 4.702247 | Jul 2025 | |
| 2 | Qwen/Qwen3.5-27B thinking (API)Open Source Qwen | 4.61236 | Jul 2025 | |
| 3 | Qwen/Qwen3.5-27B non-thinking (API)Open Source Qwen | 4.426966 | Jul 2025 | |
| 4 | deepseek-ai/DeepSeek-V3.2 (API)Open Source deepseek-ai | 4.196629 | Jul 2025 | |
| 5 | Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source Qwen | 4.191011 | Jul 2025 | |
| 6 | deepseek-ai/DeepSeek-R1 (API)Open Source deepseek-ai | 4.117978 | Jan 2025 | |
| 7 | 🚧DeepSeek-V3-0324Open Source deepseek-ai | 4.022472 | Mar 2025 | |
| 8 | deepseek-ai/DeepSeek-V3 (API)Open Source deepseek-ai | 3.988764 | Dec 2024 | |
| 9 | gemini-2.0-flash-001Open Source Google | 3.988764 | Feb 2025 | |
| 10 | moonshotai/Kimi-K2-Instruct-0905 (API)Open Source moonshotai | 3.932584 | Sep 2025 | |
| 11 | openai/gpt-oss-120b (API)Open Source openai | 3.88764 | Jun 2025 | |
| 12 | deepseek-ai/DeepSeek-V3.1 (API)Open Source deepseek-ai | 3.870787 | May 2025 | |
| 13 | gemini-2.0-flash-lite-001Open Source Google | 3.853933 | Feb 2025 | |
| 14 | Qwen/Qwen3-235B-A22B non-thinking (API)Open Source Qwen | 3.837079 | Apr 2025 | |
| 15 | Qwen2.5-72B-InstructOpen Source Qwen | 3.808989 | Sep 2024 | |
| 16 | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source meta-llama | 3.758427 | Apr 2025 | |
| 17 | Mistral-Large-Instruct-2411Open Source mistralai | 3.724719 | Nov 2024 | |
| 18 | Meta-Llama-3-70B-InstructOpen Source meta-llama | 3.707865 | Apr 2024 | |
| 19 | Qwen2-72B-InstructOpen Source Qwen | 3.679775 | Jun 2024 | |
| 20 | Mistral-Large-Instruct-2407Open Source mistralai | 3.646067 | Jul 2024 | |
| 21 | Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source Qwen | 3.640449 | Jul 2025 | |
| 22 | Qwen2.5-32B-InstructOpen Source Qwen | 3.589888 | Sep 2024 | |
| 23 | Qwen/Qwen3-32B non-thinking (API)Open Source Qwen | 3.561798 | Apr 2025 | |
| 24 | Qwen/Qwen3-30B-A3B non-thinking (API)Open Source Qwen | 3.544944 | Apr 2025 | |
| 25 | gemma-3-27b-itOpen Source google | 3.533708 | Mar 2025 | |
| 26 | Bielik-11B-v2.1-InstructOpen Source speakleash | 3.47191 | Sep 2024 | |
| 27 | Mistral-Small-24B-Instruct-2501Open Source mistralai | 3.449438 | Jan 2025 | |
| 28 | NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source nvidia | 3.432584 | Jun 2025 | |
| 29 | mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source mistralai | 3.421348 | Mar 2025 | |
| 30 | Llama-3.3-70B-InstructOpen Source meta-llama | 3.376404 | Dec 2024 | |
| 31 | Qwen2.5-14B-InstructOpen Source Qwen | 3.337079 | Sep 2024 | |
| 32 | Qwen/Qwen3-14B non-thinking (API)Open Source Qwen | 3.331461 | Apr 2025 | |
| 33 | mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source mistralai | 3.303371 | Jun 2025 | |
| 34 | Mixtral-8x22B-Instruct-v0.1Open Source mistralai | 3.235955 | Apr 2024 | |
| 35 | Bielik-11B-v2.3-InstructOpen Source speakleash | 3.219101 | Nov 2024 | |
| 36 | Llama-PLLuM-70B-chatOpen Source CYFRAGOVPL | 3.213483 | Mar 2025 | |
| 37 | Llama-4-Scout-17B-16E-InstructOpen Source meta-llama | 3.191011 | Apr 2025 | |
| 38 | Bielik-11B-v3.0-InstructOpen Source speakleash | 3.185393 | Jun 2025 | |
| 39 | Bielik-11B-v2.2-InstructOpen Source speakleash | 3.123596 | Oct 2024 | |
| 40 | Bielik-11B-v2.6-InstructOpen Source speakleash | 3.095506 | Feb 2025 | |
| 41 | WizardLM-2-8x22BOpen Source alpindale | 3.05618 | Apr 2024 | |
| 42 | Meta-Llama-3.1-70B-InstructOpen Source meta-llama | 3.011236 | Jul 2024 | |
| 43 | Bielik-11B-v2.5-InstructOpen Source speakleash | 2.910112 | Jan 2025 | |
| 44 | pllum-12b-nc-chat-250715Open Source CYFRAGOVPL | 2.898876 | Jul 2025 | |
| 45 | Qwen/Qwen3-8B non-thinking (API)Open Source Qwen | 2.764045 | Apr 2025 | |
| 46 | EuroLLM-9B-InstructOpen Source utter-project | 2.747191 | Mar 2025 | |
| 47 | speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source speakleash | 2.735955 | Jul 2025 | |
| 48 | phi-4Open Source microsoft | 2.724719 | Jan 2025 | |
| 49 | Qwen1.5-72B-ChatOpen Source Qwen | 2.668539 | Feb 2024 | |
| 50 | Llama-PLLuM-70B-instructOpen Source CYFRAGOVPL | 2.634831 | Mar 2025 | |
| 51 | CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source CYFRAGOVPL | 2.623596 | Apr 2025 | |
| 52 | PLLuM-12B-chatOpen Source CYFRAGOVPL | 2.589888 | Apr 2025 | |
| 53 | Qwen2.5-7B-InstructOpen Source Qwen | 2.58427 | Sep 2024 | |
| 54 | Meta-Llama-3-8B-InstructOpen Source meta-llama | 2.477528 | Apr 2024 | |
| 55 | Bielik-4.5B-v3.0-InstructOpen Source speakleash | 2.455056 | Jun 2025 | |
| 56 | CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source CYFRAGOVPL | 2.370787 | Jul 2025 | |
| 57 | Llama-PLLuM-8B-chatOpen Source CYFRAGOVPL | 2.252809 | Mar 2025 | |
| 58 | gemma-2-2b-itOpen Source google | 2.213483 | Jun 2024 | |
| 59 | Bielik-11B-v2.0-InstructOpen Source speakleash | 2.196629 | Aug 2024 | |
| 60 | Bielik-7B-Instruct-v0.1Open Source speakleash | 2.157303 | Apr 2024 | |
| 61 | SOLAR-10.7B-Instruct-v1.0Open Source upstage | 2.123596 | Dec 2023 | |
| 62 | Meta-Llama-3.1-8B-InstructOpen Source meta-llama | 2.11236 | Jul 2024 | |
| 63 | Mistral-Nemo-Instruct-2407Open Source mistralai | 2.089888 | Jul 2024 | |
| 64 | Mistral-7B-Instruct-v0.3Open Source mistralai | 1.988764 | May 2024 | |
| 65 | glm-4-9b-chatOpen Source THUDM | 1.983146 | Jun 2024 | |
| 66 | CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source CYFRAGOVPL | 1.983146 | Apr 2025 | |
| 67 | openchat-3.5-0106Open Source openchat | 1.960674 | Dec 2023 | |
| 68 | PLLuM-12B-instructOpen Source CYFRAGOVPL | 1.904494 | Apr 2025 | |
| 69 | Qwen2.5-3B-InstructOpen Source Qwen | 1.808989 | Sep 2024 | |
| 70 | PLLuM-8x7B-nc-chatOpen Source CYFRAGOVPL | 1.797753 | Feb 2025 | |
| 71 | Mixtral-8x7B-Instruct-v0.1Open Source mistralai | 1.797753 | Dec 2023 | |
| 72 | PLLuM-8x7B-chatOpen Source CYFRAGOVPL | 1.780899 | Feb 2025 | |
| 73 | PLLuM-8x7B-nc-instructOpen Source CYFRAGOVPL | 1.764045 | Feb 2025 | |
| 74 | openchat-3.5-0106-gemmaOpen Source openchat | 1.679775 | Dec 2023 | |
| 75 | Starling-LM-7B-alphaOpen Source berkeley-nest | 1.679775 | Nov 2023 | |
| 76 | CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source CYFRAGOVPL | 1.662921 | Mar 2025 | |
| 77 | PLLuM-8x7B-instructOpen Source CYFRAGOVPL | 1.505618 | Feb 2025 | |
| 78 | Phi-4-mini-instructOpen Source microsoft | 1.303371 | Apr 2025 | |
| 79 | Bielik-1.5B-v3.0-InstructOpen Source speakleash | 1.219101 | Jun 2025 | |
| 80 | Llama-3.2-3B-InstructOpen Source meta-llama | 1.219101 | Sep 2024 | |
| 81 | NousResearch/Hermes-3-Llama-3.2-3BOpen Source NousResearch | 1.140449 | Oct 2024 | |
| 82 | Phi-3.5-mini-instructOpen Source microsoft | 1.044944 | Aug 2024 | |
| 83 | trurl-2-13b-academicOpen Source Voicelab | 1.016854 | Jan 2024 | |
| 84 | Yi-1.5-34B-ChatOpen Source 01-ai | 1 | May 2024 | |
| 85 | EuroLLM-1.7B-InstructOpen Source utter-project | 0.758 | Jan 2025 | |
| 86 | Qwen2.5-1.5B-InstructOpen Source Qwen | 0.663 | Sep 2024 | |
| 87 | granite-3.1-2b-instructOpen Source ibm-granite | 0.590 | Jan 2025 | |
| 88 | Llama-3.2-1B-InstructOpen Source meta-llama | 0.522 | Sep 2024 | |
| 89 | LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source LGAI-EXAONE | 0.489 | Jan 2025 | |
| 90 | SmolLM2-1.7B-InstructOpen Source HuggingFaceTB | 0.253 | Feb 2025 | |
| 91 | Qwen/Qwen2.5-0.5B-InstructOpen Source Qwen | 0.219 | Sep 2024 | |
| 92 | h2oai/h2o-danube2-1.8b-chatOpen Source h2oai | 0.129 | Apr 2024 | |
| 93 | internlm2-chat-20bOpen Source internlm | 0.124 | Jan 2024 |
Polish Models
Bielik (SpeakLeash) and PLLuM (CYFRA GOV PL) — models built specifically for Polish language.
| # | Model | Average |
|---|---|---|
| 1 | Bielik-11B-v3.0-Instructspeakleash | 3.73 |
| 2 | pllum-12b-nc-chat-250715CYFRAGOVPL | 3.67 |
| 3 | Bielik-11B-v2.6-Instructspeakleash | 3.64 |
| 4 | Bielik-11B-v2.3-Instructspeakleash | 3.63 |
| 5 | Bielik-11B-v2.1-Instructspeakleash | 3.61 |
| 6 | Llama-PLLuM-70B-chatCYFRAGOVPL | 3.53 |
| 7 | Bielik-11B-v2.5-Instructspeakleash | 3.48 |
| 8 | Bielik-11B-v2.2-Instructspeakleash | 3.46 |
| 9 | speakleash/Bielik-Minitron-7B-v3.0-Instructspeakleash | 3.38 |
| 10 | Bielik-4.5B-v3.0-Instructspeakleash | 3.38 |
| 11 | Llama-PLLuM-70B-instructCYFRAGOVPL | 3.33 |
| 12 | CYFRAGOVPL/pllum-12b-nc-instruct-250715CYFRAGOVPL | 3.33 |
| 13 | Bielik-11B-v2.0-Instructspeakleash | 3.26 |
| 14 | CYFRAGOVPL/PLLuM-12B-nc-chatCYFRAGOVPL | 3.15 |
| 15 | PLLuM-12B-chatCYFRAGOVPL | 3.14 |
| 16 | PLLuM-8x7B-nc-instructCYFRAGOVPL | 3.11 |
| 17 | PLLuM-12B-instructCYFRAGOVPL | 3.09 |
| 18 | PLLuM-8x7B-nc-chatCYFRAGOVPL | 3.03 |
| 19 | PLLuM-8x7B-instructCYFRAGOVPL | 3.01 |
| 20 | PLLuM-8x7B-chatCYFRAGOVPL | 3.01 |
| 21 | CYFRAGOVPL/PLLuM-12B-nc-instructCYFRAGOVPL | 2.96 |
| 22 | Llama-PLLuM-8B-chatCYFRAGOVPL | 2.92 |
| 23 | Bielik-7B-Instruct-v0.1speakleash | 2.88 |
| 24 | CYFRAGOVPL/Llama-PLLuM-8B-instructCYFRAGOVPL | 2.82 |
| 25 | Bielik-1.5B-v3.0-Instructspeakleash | 2.36 |
Bielik (SpeakLeash)PLLuM (CYFRA GOV PL)Global SOTA: 4.34
Polish Model Evolution
Tracking Bielik and PLLuM performance on CPTU-Bench over time
Average Score Over Versions
Bielik Versions
Bielik-7B-Instruct-v0.1
Apr 2024
Bielik-11B-v2.0-Instruct
Aug 2024
Bielik-11B-v2.1-Instruct
Sep 2024
Bielik-11B-v2.2-Instruct
Oct 2024
Bielik-11B-v2.3-Instruct
Nov 2024
Bielik-11B-v2.5-Instruct
Jan 2025
Bielik-11B-v2.6-Instruct
Feb 2025
Bielik-1.5B-v3.0-Instruct
Jun 2025
Bielik-11B-v3.0-Instruct
Jun 2025
Bielik-4.5B-v3.0-Instruct
Jun 2025
speakleash/Bielik-Minitron-7B-v3.0-Instruct
Jul 2025
PLLuM Versions
PLLuM-8x7B-chat
Feb 2025
PLLuM-8x7B-instruct
Feb 2025
PLLuM-8x7B-nc-chat
Feb 2025
PLLuM-8x7B-nc-instruct
Feb 2025
CYFRAGOVPL/Llama-PLLuM-8B-instruct
Mar 2025
Llama-PLLuM-70B-chat
Mar 2025
Llama-PLLuM-70B-instruct
Mar 2025
Llama-PLLuM-8B-chat
Mar 2025
CYFRAGOVPL/PLLuM-12B-nc-chat
Apr 2025
CYFRAGOVPL/PLLuM-12B-nc-instruct
Apr 2025
PLLuM-12B-chat
Apr 2025
PLLuM-12B-instruct
Apr 2025
CYFRAGOVPL/pllum-12b-nc-instruct-250715
Jul 2025
pllum-12b-nc-chat-250715
Jul 2025
Frontier Comparison
Best Bielik is 13.9% behind current SOTA