Polish Text Understanding2025en

Complex Polish Text Understanding Benchmark

Evaluates LLMs on understanding Polish text across 4 dimensions: sentiment analysis, language understanding (implicatures, author intent), phraseology (idioms, phraseological compounds), and tricky questions (logic, ambiguity, hallucination resistance). Score range 0-5 per category. 378 hand-written examples. Created by SpeakLeash/Spichlerz.

Samples:93
Metrics:average, sentiment, language-understanding, phraseology, tricky-questions
Paper / WebsiteDownload
Current State of the Art

Qwen/Qwen3.5-27B thinking (API)

Qwen

4.336359

average

CPTU-Bench — average

93 results · 11 SOTA advances · higher is better

All results
SOTA frontier
1234520232024202520262027averageStarling-LM-7B-alphaQwen2.5-72B-InstructQwen/Qwen3.5-27B thinking (API)

Model Size vs Score — Pareto Frontier

91 models · log scale · Pareto frontier shown

Global
Bielik
PLLuM
Pareto
1.52.02.53.03.54.04.5500M1B2B3B7B11B14B24B32B70B120B235B700BParameters (log scale)averageBielik-11B-v3.0pllum-12b-nc-chat-250715Bielik-11B-v2.6Bielik-11B-v2.3Bielik-11B-v2.1Llama-PLLuM-70B-chatBielik-11B-v2.5Bielik-11B-v2.2Bielik-Minitron-7B-v3.0Bielik-4.5B-v3.0Llama-PLLuM-70B-instructpllum-12b-nc-instruct-250715Bielik-11B-v2.0PLLuM-12B-nc-chatPLLuM-12B-chatPLLuM-8x7B-nc-instructPLLuM-12B-instructPLLuM-8x7B-nc-chatPLLuM-8x7B-instructPLLuM-8x7B-chatPLLuM-12B-nc-instructLlama-PLLuM-8B-chatBielik-7B-Instruct-v0.1Llama-PLLuM-8B-instructBielik-1.5B-v3.0

average Progress Over Time

Showing 14 breakthroughs from Nov 2023 to Jul 2025

2.53.03.54.04.5Nov 2023Mar 2024Jul 2024Nov 2024Mar 2025Jul 2025averageDate

Key Milestones

Nov 2023
Starling-LM-7B-alpha
2.6
Dec 2023
Mixtral-8x7B-Instruct-v0.1
2.7
+3.8%
Dec 2023
openchat-3.5-0106-gemma
2.7
+0.2%
Dec 2023
SOLAR-10.7B-Instruct-v1.0
2.9
+5.4%
Feb 2024
Qwen1.5-72B-Chat
3.2
+9.6%
Apr 2024
WizardLM-2-8x22B
3.7
+17.1%
Apr 2024
Meta-Llama-3-70B-Instruct
3.8
+2.2%
Jul 2024
Mistral-Large-Instruct-2407
3.9
+4.0%
Sep 2024
Qwen2.5-72B-Instruct
3.9
+0.3%
Nov 2024
Mistral-Large-Instruct-2411
4.0
+1.5%
Dec 2024
deepseek-ai/DeepSeek-V3 (API)
4.0
+0.5%
Jan 2025
deepseek-ai/DeepSeek-R1 (API)
4.1
+2.8%
Feb 2025
gemini-2.0-flash-001
4.3
+3.7%
Jul 2025
Qwen/Qwen3.5-27B thinking (API)Current SOTA
4.3
+1.0%
Total Improvement
64.9%
Time Span
1y 8m
Breakthroughs
14
Current SOTA
4.3

Top Models Performance Comparison

Top 10 models ranked by average

average1Qwen/Qwen3.5-27B thinking...4.3100.0%2gemini-2.0-flash-0014.399.0%3Qwen/Qwen3.5-27B non-thin...4.398.5%4Qwen/Qwen3.5-35B-A3B thin...4.297.4%5Qwen/Qwen3.5-35B-A3B non-...4.296.3%6deepseek-ai/DeepSeek-V3.2...4.195.5%7deepseek-ai/DeepSeek-R1 (...4.195.4%8gemini-2.0-flash-lite-0014.194.4%9🚧DeepSeek-V3-03244.092.9%10deepseek-ai/DeepSeek-V3.1...4.092.8%0%25%50%75%100%% of best
Best Score
4.3
Top Model
Qwen/Qwen3.5-27B ...
Models Compared
10
Score Range
0.310

averagePrimary

#ModelScorePaper / CodeDate
1
Qwen/Qwen3.5-27B thinking (API)Open Source
Qwen
4.336359Jul 2025
2
gemini-2.0-flash-001Open Source
Google
4.291999Feb 2025
3
Qwen/Qwen3.5-27B non-thinking (API)Open Source
Qwen
4.27171Jul 2025
4
Qwen/Qwen3.5-35B-A3B thinking (API)Open Source
Qwen
4.223703Jul 2025
5
Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source
Qwen
4.175445Jul 2025
6
deepseek-ai/DeepSeek-V3.2 (API)Open Source
deepseek-ai
4.139189Jul 2025
7
deepseek-ai/DeepSeek-R1 (API)Open Source
deepseek-ai
4.137539Jan 2025
8
gemini-2.0-flash-lite-001Open Source
Google
4.093675Feb 2025
9
🚧DeepSeek-V3-0324Open Source
deepseek-ai
4.029112Mar 2025
10
deepseek-ai/DeepSeek-V3.1 (API)Open Source
deepseek-ai
4.025966May 2025
11
deepseek-ai/DeepSeek-V3 (API)Open Source
deepseek-ai
4.023185Dec 2024
12
Mistral-Large-Instruct-2411Open Source
mistralai
4.004161Nov 2024
13
moonshotai/Kimi-K2-Instruct-0905 (API)Open Source
moonshotai
3.983402Sep 2025
14
Qwen2.5-72B-InstructOpen Source
Qwen
3.946478Sep 2024
15
Mistral-Large-Instruct-2407Open Source
mistralai
3.934209Jul 2024
16
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source
meta-llama
3.933613Apr 2025
17
Qwen/Qwen3-235B-A22B non-thinking (API)Open Source
Qwen
3.910936Apr 2025
18
mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source
mistralai
3.897741Mar 2025
19
mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source
mistralai
3.827445Jun 2025
20
openai/gpt-oss-120b (API)Open Source
openai
3.822487Jun 2025
21
gemma-3-27b-itOpen Source
google
3.805478Mar 2025
22
Meta-Llama-3-70B-InstructOpen Source
meta-llama
3.78187Apr 2024
23
Qwen2.5-32B-InstructOpen Source
Qwen
3.750998Sep 2024
24
Llama-4-Scout-17B-16E-InstructOpen Source
meta-llama
3.749644Apr 2025
25
Bielik-11B-v3.0-InstructOpen Source
speakleash
3.73465Jun 2025
26
Qwen/Qwen3-32B non-thinking (API)Open Source
Qwen
3.710353Apr 2025
27
Mistral-Small-24B-Instruct-2501Open Source
mistralai
3.708674Jan 2025
28
WizardLM-2-8x22BOpen Source
alpindale
3.699077Apr 2024
29
pllum-12b-nc-chat-250715Open Source
CYFRAGOVPL
3.666963Jul 2025
30
Qwen2-72B-InstructOpen Source
Qwen
3.653149Jun 2024
31
Llama-3.3-70B-InstructOpen Source
meta-llama
3.644069Dec 2024
32
Bielik-11B-v2.6-InstructOpen Source
speakleash
3.637017Feb 2025
33
Bielik-11B-v2.3-InstructOpen Source
speakleash
3.632115Nov 2024
34
Meta-Llama-3.1-70B-InstructOpen Source
meta-llama
3.62454Jul 2024
35
Bielik-11B-v2.1-InstructOpen Source
speakleash
3.61176Sep 2024
36
Mixtral-8x22B-Instruct-v0.1Open Source
mistralai
3.560752Apr 2024
37
Qwen2.5-14B-InstructOpen Source
Qwen
3.545584Sep 2024
38
Qwen/Qwen3-30B-A3B non-thinking (API)Open Source
Qwen
3.536973Apr 2025
39
Llama-PLLuM-70B-chatOpen Source
CYFRAGOVPL
3.528371Mar 2025
40
Qwen/Qwen3-14B non-thinking (API)Open Source
Qwen
3.511679Apr 2025
41
Bielik-11B-v2.5-InstructOpen Source
speakleash
3.476631Jan 2025
42
Bielik-11B-v2.2-InstructOpen Source
speakleash
3.455386Oct 2024
43
speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source
speakleash
3.378476Jul 2025
44
Bielik-4.5B-v3.0-InstructOpen Source
speakleash
3.375719Jun 2025
45
Llama-PLLuM-70B-instructOpen Source
CYFRAGOVPL
3.326208Mar 2025
46
CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source
CYFRAGOVPL
3.325261Jul 2025
47
phi-4Open Source
microsoft
3.304417Jan 2025
48
Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source
Qwen
3.275817Jul 2025
49
Bielik-11B-v2.0-InstructOpen Source
speakleash
3.260247Aug 2024
50
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source
nvidia
3.247056Jun 2025
51
Qwen1.5-72B-ChatOpen Source
Qwen
3.158225Feb 2024
52
CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source
CYFRAGOVPL
3.153399Apr 2025
53
EuroLLM-9B-InstructOpen Source
utter-project
3.145644Mar 2025
54
PLLuM-12B-chatOpen Source
CYFRAGOVPL
3.137472Apr 2025
55
PLLuM-8x7B-nc-instructOpen Source
CYFRAGOVPL
3.113511Feb 2025
56
PLLuM-12B-instructOpen Source
CYFRAGOVPL
3.093624Apr 2025
57
Qwen2.5-7B-InstructOpen Source
Qwen
3.06549Sep 2024
58
Qwen/Qwen3-8B non-thinking (API)Open Source
Qwen
3.061909Apr 2025
59
PLLuM-8x7B-nc-chatOpen Source
CYFRAGOVPL
3.029438Feb 2025
60
Meta-Llama-3.1-8B-InstructOpen Source
meta-llama
3.01168Jul 2024
61
PLLuM-8x7B-instructOpen Source
CYFRAGOVPL
3.006404Feb 2025
62
PLLuM-8x7B-chatOpen Source
CYFRAGOVPL
3.005225Feb 2025
63
Meta-Llama-3-8B-InstructOpen Source
meta-llama
2.998965Apr 2024
64
CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source
CYFRAGOVPL
2.963287Apr 2025
65
glm-4-9b-chatOpen Source
THUDM
2.951972Jun 2024
66
Mistral-Nemo-Instruct-2407Open Source
mistralai
2.940228Jul 2024
67
Llama-PLLuM-8B-chatOpen Source
CYFRAGOVPL
2.918202Mar 2025
68
Bielik-7B-Instruct-v0.1Open Source
speakleash
2.884262Apr 2024
69
SOLAR-10.7B-Instruct-v1.0Open Source
upstage
2.881636Dec 2023
70
CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source
CYFRAGOVPL
2.81573Mar 2025
71
Mistral-7B-Instruct-v0.3Open Source
mistralai
2.763922May 2024
72
openchat-3.5-0106-gemmaOpen Source
openchat
2.733886Dec 2023
73
Mixtral-8x7B-Instruct-v0.1Open Source
mistralai
2.728861Dec 2023
74
gemma-2-2b-itOpen Source
google
2.65148Jun 2024
75
Starling-LM-7B-alphaOpen Source
berkeley-nest
2.629367Nov 2023
76
openchat-3.5-0106Open Source
openchat
2.627733Dec 2023
77
Qwen2.5-3B-InstructOpen Source
Qwen
2.503177Sep 2024
78
Bielik-1.5B-v3.0-InstructOpen Source
speakleash
2.363686Jun 2025
79
Yi-1.5-34B-ChatOpen Source
01-ai
2.331731May 2024
80
trurl-2-13b-academicOpen Source
Voicelab
2.309534Jan 2024
81
NousResearch/Hermes-3-Llama-3.2-3BOpen Source
NousResearch
2.306459Oct 2024
82
Phi-4-mini-instructOpen Source
microsoft
2.16767Apr 2025
83
internlm2-chat-20bOpen Source
internlm
2.148719Jan 2024
84
Phi-3.5-mini-instructOpen Source
microsoft
2.01021Aug 2024
85
Llama-3.2-3B-InstructOpen Source
meta-llama
1.997628Sep 2024
86
granite-3.1-2b-instructOpen Source
ibm-granite
1.945453Jan 2025
87
Llama-3.2-1B-InstructOpen Source
meta-llama
1.918599Sep 2024
88
EuroLLM-1.7B-InstructOpen Source
utter-project
1.763004Jan 2025
89
Qwen2.5-1.5B-InstructOpen Source
Qwen
1.758198Sep 2024
90
LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source
LGAI-EXAONE
1.669326Jan 2025
91
h2oai/h2o-danube2-1.8b-chatOpen Source
h2oai
1.641502Apr 2024
92
SmolLM2-1.7B-InstructOpen Source
HuggingFaceTB
1.495863Feb 2025
93
Qwen/Qwen2.5-0.5B-InstructOpen Source
Qwen
1.401057Sep 2024

language-understanding

#ModelScorePaper / CodeDate
1
deepseek-ai/DeepSeek-V3.2 (API)Open Source
deepseek-ai
4.36Jul 2025
2
deepseek-ai/DeepSeek-R1 (API)Open Source
deepseek-ai
4.345Jan 2025
3
deepseek-ai/DeepSeek-V3.1 (API)Open Source
deepseek-ai
4.335May 2025
4
gemini-2.0-flash-001Open Source
Google
4.32Feb 2025
5
deepseek-ai/DeepSeek-V3 (API)Open Source
deepseek-ai
4.22Dec 2024
6
Qwen/Qwen3.5-27B thinking (API)Open Source
Qwen
4.205Jul 2025
7
🚧DeepSeek-V3-0324Open Source
deepseek-ai
4.195Mar 2025
8
moonshotai/Kimi-K2-Instruct-0905 (API)Open Source
moonshotai
4.18Sep 2025
9
Qwen/Qwen3.5-27B non-thinking (API)Open Source
Qwen
4.17Jul 2025
10
Qwen/Qwen3-235B-A22B non-thinking (API)Open Source
Qwen
4.155Apr 2025
11
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source
meta-llama
4.11Apr 2025
12
gemini-2.0-flash-lite-001Open Source
Google
4.055Feb 2025
13
Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source
Qwen
4.05Jul 2025
14
mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source
mistralai
4.005Jun 2025
15
Mistral-Large-Instruct-2407Open Source
mistralai
4Jul 2024
16
Mistral-Large-Instruct-2411Open Source
mistralai
3.975Nov 2024
17
openai/gpt-oss-120b (API)Open Source
openai
3.97Jun 2025
18
Qwen2.5-72B-InstructOpen Source
Qwen
3.97Sep 2024
19
pllum-12b-nc-chat-250715Open Source
CYFRAGOVPL
3.955Jul 2025
20
Bielik-11B-v2.6-InstructOpen Source
speakleash
3.94Feb 2025
21
Qwen/Qwen3.5-35B-A3B thinking (API)Open Source
Qwen
3.94Jul 2025
22
Bielik-11B-v2.1-InstructOpen Source
speakleash
3.915Sep 2024
23
Qwen/Qwen3-32B non-thinking (API)Open Source
Qwen
3.91Apr 2025
24
Bielik-11B-v3.0-InstructOpen Source
speakleash
3.91Jun 2025
25
Meta-Llama-3.1-70B-InstructOpen Source
meta-llama
3.91Jul 2024
26
Qwen2-72B-InstructOpen Source
Qwen
3.89Jun 2024
27
mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source
mistralai
3.885Mar 2025
28
Llama-3.3-70B-InstructOpen Source
meta-llama
3.865Dec 2024
29
Bielik-11B-v2.5-InstructOpen Source
speakleash
3.86Jan 2025
30
speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source
speakleash
3.83Jul 2025
31
Meta-Llama-3-70B-InstructOpen Source
meta-llama
3.82Apr 2024
32
WizardLM-2-8x22BOpen Source
alpindale
3.815Apr 2024
33
Llama-4-Scout-17B-16E-InstructOpen Source
meta-llama
3.805Apr 2025
34
Bielik-11B-v2.3-InstructOpen Source
speakleash
3.785Nov 2024
35
gemma-3-27b-itOpen Source
google
3.785Mar 2025
36
Bielik-11B-v2.0-InstructOpen Source
speakleash
3.745Aug 2024
37
Bielik-11B-v2.2-InstructOpen Source
speakleash
3.73Oct 2024
38
CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source
CYFRAGOVPL
3.725Jul 2025
39
Mixtral-8x22B-Instruct-v0.1Open Source
mistralai
3.675Apr 2024
40
Llama-PLLuM-70B-instructOpen Source
CYFRAGOVPL
3.63Mar 2025
41
Llama-PLLuM-70B-chatOpen Source
CYFRAGOVPL
3.61Mar 2025
42
Bielik-4.5B-v3.0-InstructOpen Source
speakleash
3.61Jun 2025
43
Mistral-Small-24B-Instruct-2501Open Source
mistralai
3.6Jan 2025
44
PLLuM-8x7B-nc-instructOpen Source
CYFRAGOVPL
3.59Feb 2025
45
Qwen2.5-32B-InstructOpen Source
Qwen
3.565Sep 2024
46
Qwen2.5-14B-InstructOpen Source
Qwen
3.565Sep 2024
47
Qwen/Qwen3-14B non-thinking (API)Open Source
Qwen
3.56Apr 2025
48
phi-4Open Source
microsoft
3.54Jan 2025
49
Qwen1.5-72B-ChatOpen Source
Qwen
3.515Feb 2024
50
PLLuM-8x7B-nc-chatOpen Source
CYFRAGOVPL
3.48Feb 2025
51
Bielik-7B-Instruct-v0.1Open Source
speakleash
3.475Apr 2024
52
PLLuM-8x7B-instructOpen Source
CYFRAGOVPL
3.47Feb 2025
53
glm-4-9b-chatOpen Source
THUDM
3.455Jun 2024
54
PLLuM-8x7B-chatOpen Source
CYFRAGOVPL
3.45Feb 2025
55
Qwen/Qwen3-30B-A3B non-thinking (API)Open Source
Qwen
3.39Apr 2025
56
Meta-Llama-3.1-8B-InstructOpen Source
meta-llama
3.38Jul 2024
57
CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source
CYFRAGOVPL
3.31Apr 2025
58
EuroLLM-9B-InstructOpen Source
utter-project
3.3Mar 2025
59
Mistral-Nemo-Instruct-2407Open Source
mistralai
3.29Jul 2024
60
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source
nvidia
3.27Jun 2025
61
CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source
CYFRAGOVPL
3.23Apr 2025
62
Qwen/Qwen3-8B non-thinking (API)Open Source
Qwen
3.225Apr 2025
63
PLLuM-12B-chatOpen Source
CYFRAGOVPL
3.21Apr 2025
64
SOLAR-10.7B-Instruct-v1.0Open Source
upstage
3.18Dec 2023
65
Mixtral-8x7B-Instruct-v0.1Open Source
mistralai
3.175Dec 2023
66
PLLuM-12B-instructOpen Source
CYFRAGOVPL
3.17Apr 2025
67
Meta-Llama-3-8B-InstructOpen Source
meta-llama
3.15Apr 2024
68
openchat-3.5-0106-gemmaOpen Source
openchat
3.08Dec 2023
69
Mistral-7B-Instruct-v0.3Open Source
mistralai
3.06May 2024
70
Qwen2.5-7B-InstructOpen Source
Qwen
3.025Sep 2024
71
Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source
Qwen
2.975Jul 2025
72
Llama-PLLuM-8B-chatOpen Source
CYFRAGOVPL
2.93Mar 2025
73
Starling-LM-7B-alphaOpen Source
berkeley-nest
2.925Nov 2023
74
gemma-2-2b-itOpen Source
google
2.9Jun 2024
75
CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source
CYFRAGOVPL
2.9Mar 2025
76
Yi-1.5-34B-ChatOpen Source
01-ai
2.87May 2024
77
openchat-3.5-0106Open Source
openchat
2.835Dec 2023
78
internlm2-chat-20bOpen Source
internlm
2.785Jan 2024
79
trurl-2-13b-academicOpen Source
Voicelab
2.755Jan 2024
80
NousResearch/Hermes-3-Llama-3.2-3BOpen Source
NousResearch
2.705Oct 2024
81
Qwen2.5-3B-InstructOpen Source
Qwen
2.455Sep 2024
82
Phi-4-mini-instructOpen Source
microsoft
2.43Apr 2025
83
Bielik-1.5B-v3.0-InstructOpen Source
speakleash
2.33Jun 2025
84
Llama-3.2-3B-InstructOpen Source
meta-llama
2.295Sep 2024
85
granite-3.1-2b-instructOpen Source
ibm-granite
2.235Jan 2025
86
Phi-3.5-mini-instructOpen Source
microsoft
2.135Aug 2024
87
LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source
LGAI-EXAONE
2.115578Jan 2025
88
EuroLLM-1.7B-InstructOpen Source
utter-project
1.79Jan 2025
89
Llama-3.2-1B-InstructOpen Source
meta-llama
1.735Sep 2024
90
h2oai/h2o-danube2-1.8b-chatOpen Source
h2oai
1.595Apr 2024
91
Qwen2.5-1.5B-InstructOpen Source
Qwen
1.35Sep 2024
92
SmolLM2-1.7B-InstructOpen Source
HuggingFaceTB
1.1Feb 2025
93
Qwen/Qwen2.5-0.5B-InstructOpen Source
Qwen
0.835Sep 2024

phraseology

#ModelScorePaper / CodeDate
1
gemini-2.0-flash-001Open Source
Google
4.34Feb 2025
2
gemini-2.0-flash-lite-001Open Source
Google
4.235Feb 2025
3
Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source
Qwen
4.23Jul 2025
4
WizardLM-2-8x22BOpen Source
alpindale
4.22Apr 2024
5
Qwen/Qwen3.5-27B non-thinking (API)Open Source
Qwen
4.195Jul 2025
6
Qwen/Qwen3.5-35B-A3B thinking (API)Open Source
Qwen
4.15Jul 2025
7
mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source
mistralai
4.15Mar 2025
8
Qwen/Qwen3.5-27B thinking (API)Open Source
Qwen
4.105Jul 2025
9
Qwen2.5-32B-InstructOpen Source
Qwen
4.035Sep 2024
10
gemma-3-27b-itOpen Source
google
4.025Mar 2025
11
mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source
mistralai
3.995Jun 2025
12
Mistral-Large-Instruct-2411Open Source
mistralai
3.99Nov 2024
13
Bielik-11B-v3.0-InstructOpen Source
speakleash
3.965Jun 2025
14
Qwen2.5-72B-InstructOpen Source
Qwen
3.93Sep 2024
15
Llama-4-Scout-17B-16E-InstructOpen Source
meta-llama
3.9Apr 2025
16
Mistral-Small-24B-Instruct-2501Open Source
mistralai
3.875Jan 2025
17
Mistral-Large-Instruct-2407Open Source
mistralai
3.86Jul 2024
18
Bielik-4.5B-v3.0-InstructOpen Source
speakleash
3.675Jun 2025
19
deepseek-ai/DeepSeek-R1 (API)Open Source
deepseek-ai
3.6Jan 2025
20
PLLuM-12B-instructOpen Source
CYFRAGOVPL
3.59Apr 2025
21
Mixtral-8x22B-Instruct-v0.1Open Source
mistralai
3.55Apr 2024
22
Bielik-11B-v2.3-InstructOpen Source
speakleash
3.55Nov 2024
23
deepseek-ai/DeepSeek-V3.2 (API)Open Source
deepseek-ai
3.545Jul 2025
24
🚧DeepSeek-V3-0324Open Source
deepseek-ai
3.54Mar 2025
25
CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source
CYFRAGOVPL
3.54Apr 2025
26
deepseek-ai/DeepSeek-V3 (API)Open Source
deepseek-ai
3.525Dec 2024
27
Qwen/Qwen3-30B-A3B non-thinking (API)Open Source
Qwen
3.495Apr 2025
28
openai/gpt-oss-120b (API)Open Source
openai
3.49Jun 2025
29
Qwen/Qwen3-235B-A22B non-thinking (API)Open Source
Qwen
3.485Apr 2025
30
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source
meta-llama
3.475Apr 2025
31
deepseek-ai/DeepSeek-V3.1 (API)Open Source
deepseek-ai
3.475May 2025
32
Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source
Qwen
3.475Jul 2025
33
Meta-Llama-3-70B-InstructOpen Source
meta-llama
3.465Apr 2024
34
CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source
CYFRAGOVPL
3.46Mar 2025
35
PLLuM-8x7B-instructOpen Source
CYFRAGOVPL
3.46Feb 2025
36
pllum-12b-nc-chat-250715Open Source
CYFRAGOVPL
3.455Jul 2025
37
moonshotai/Kimi-K2-Instruct-0905 (API)Open Source
moonshotai
3.43Sep 2025
38
PLLuM-12B-chatOpen Source
CYFRAGOVPL
3.43Apr 2025
39
Bielik-11B-v2.6-InstructOpen Source
speakleash
3.41Feb 2025
40
Qwen2.5-14B-InstructOpen Source
Qwen
3.37Sep 2024
41
Llama-PLLuM-8B-chatOpen Source
CYFRAGOVPL
3.36Mar 2025
42
Llama-PLLuM-70B-chatOpen Source
CYFRAGOVPL
3.35Mar 2025
43
PLLuM-8x7B-chatOpen Source
CYFRAGOVPL
3.35Feb 2025
44
CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source
CYFRAGOVPL
3.32Apr 2025
45
CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source
CYFRAGOVPL
3.295Jul 2025
46
Qwen2-72B-InstructOpen Source
Qwen
3.28Jun 2024
47
Llama-PLLuM-70B-instructOpen Source
CYFRAGOVPL
3.26Mar 2025
48
SOLAR-10.7B-Instruct-v1.0Open Source
upstage
3.255Dec 2023
49
Bielik-11B-v2.2-InstructOpen Source
speakleash
3.25Oct 2024
50
Meta-Llama-3.1-70B-InstructOpen Source
meta-llama
3.25Jul 2024
51
Qwen/Qwen3-14B non-thinking (API)Open Source
Qwen
3.245Apr 2025
52
phi-4Open Source
microsoft
3.235Jan 2025
53
Qwen/Qwen3-32B non-thinking (API)Open Source
Qwen
3.235Apr 2025
54
speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source
speakleash
3.23Jul 2025
55
PLLuM-8x7B-nc-instructOpen Source
CYFRAGOVPL
3.22Feb 2025
56
EuroLLM-9B-InstructOpen Source
utter-project
3.17Mar 2025
57
Bielik-11B-v2.5-InstructOpen Source
speakleash
3.13Jan 2025
58
Bielik-11B-v2.0-InstructOpen Source
speakleash
3.125Aug 2024
59
Bielik-11B-v2.1-InstructOpen Source
speakleash
3.105Sep 2024
60
Qwen2.5-7B-InstructOpen Source
Qwen
3.095Sep 2024
61
PLLuM-8x7B-nc-chatOpen Source
CYFRAGOVPL
3.08Feb 2025
62
Llama-3.3-70B-InstructOpen Source
meta-llama
3.04Dec 2024
63
Meta-Llama-3-8B-InstructOpen Source
meta-llama
3.035Apr 2024
64
Qwen1.5-72B-ChatOpen Source
Qwen
2.975Feb 2024
65
Mixtral-8x7B-Instruct-v0.1Open Source
mistralai
2.885Dec 2023
66
Starling-LM-7B-alphaOpen Source
berkeley-nest
2.855Nov 2023
67
Qwen2.5-3B-InstructOpen Source
Qwen
2.8Sep 2024
68
glm-4-9b-chatOpen Source
THUDM
2.78Jun 2024
69
Qwen/Qwen3-8B non-thinking (API)Open Source
Qwen
2.765Apr 2025
70
NousResearch/Hermes-3-Llama-3.2-3BOpen Source
NousResearch
2.765Oct 2024
71
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source
nvidia
2.76Jun 2025
72
Mistral-Nemo-Instruct-2407Open Source
mistralai
2.74Jul 2024
73
Mistral-7B-Instruct-v0.3Open Source
mistralai
2.68May 2024
74
Qwen/Qwen2.5-0.5B-InstructOpen Source
Qwen
2.595Sep 2024
75
Meta-Llama-3.1-8B-InstructOpen Source
meta-llama
2.58Jul 2024
76
openchat-3.5-0106Open Source
openchat
2.555Dec 2023
77
h2oai/h2o-danube2-1.8b-chatOpen Source
h2oai
2.47Apr 2024
78
openchat-3.5-0106-gemmaOpen Source
openchat
2.445Dec 2023
79
Phi-3.5-mini-instructOpen Source
microsoft
2.425Aug 2024
80
internlm2-chat-20bOpen Source
internlm
2.385Jan 2024
81
Bielik-1.5B-v3.0-InstructOpen Source
speakleash
2.38Jun 2025
82
Yi-1.5-34B-ChatOpen Source
01-ai
2.38May 2024
83
SmolLM2-1.7B-InstructOpen Source
HuggingFaceTB
2.355Feb 2025
84
Llama-3.2-1B-InstructOpen Source
meta-llama
2.34Sep 2024
85
Bielik-7B-Instruct-v0.1Open Source
speakleash
2.315Apr 2024
86
EuroLLM-1.7B-InstructOpen Source
utter-project
2.26Jan 2025
87
Phi-4-mini-instructOpen Source
microsoft
2.245Apr 2025
88
Qwen2.5-1.5B-InstructOpen Source
Qwen
2.225Sep 2024
89
trurl-2-13b-academicOpen Source
Voicelab
2.165Jan 2024
90
LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source
LGAI-EXAONE
2.130653Jan 2025
91
gemma-2-2b-itOpen Source
google
2.095Jun 2024
92
granite-3.1-2b-instructOpen Source
ibm-granite
1.88Jan 2025
93
Llama-3.2-3B-InstructOpen Source
meta-llama
1.72Sep 2024

sentiment

#ModelScorePaper / CodeDate
1
gemini-2.0-flash-001Open Source
Google
4.519231Feb 2025
2
deepseek-ai/DeepSeek-R1 (API)Open Source
deepseek-ai
4.487179Jan 2025
3
deepseek-ai/DeepSeek-V3.2 (API)Open Source
deepseek-ai
4.455128Jul 2025
4
deepseek-ai/DeepSeek-V3.1 (API)Open Source
deepseek-ai
4.423077May 2025
5
Qwen/Qwen3.5-27B thinking (API)Open Source
Qwen
4.423077Jul 2025
6
moonshotai/Kimi-K2-Instruct-0905 (API)Open Source
moonshotai
4.391026Sep 2025
7
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source
meta-llama
4.391026Apr 2025
8
deepseek-ai/DeepSeek-V3 (API)Open Source
deepseek-ai
4.358974Dec 2024
9
🚧DeepSeek-V3-0324Open Source
deepseek-ai
4.358974Mar 2025
10
pllum-12b-nc-chat-250715Open Source
CYFRAGOVPL
4.358974Jul 2025
11
Mistral-Large-Instruct-2411Open Source
mistralai
4.326923Nov 2024
12
Meta-Llama-3.1-70B-InstructOpen Source
meta-llama
4.326923Jul 2024
13
Llama-3.3-70B-InstructOpen Source
meta-llama
4.294872Dec 2024
14
Qwen/Qwen3.5-27B non-thinking (API)Open Source
Qwen
4.294872Jul 2025
15
Mistral-Large-Instruct-2407Open Source
mistralai
4.230769Jul 2024
16
gemini-2.0-flash-lite-001Open Source
Google
4.230769Feb 2025
17
Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source
Qwen
4.230769Jul 2025
18
Qwen/Qwen3-235B-A22B non-thinking (API)Open Source
Qwen
4.166667Apr 2025
19
Meta-Llama-3-70B-InstructOpen Source
meta-llama
4.134615Apr 2024
20
Qwen/Qwen3-32B non-thinking (API)Open Source
Qwen
4.134615Apr 2025
21
mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source
mistralai
4.134615Mar 2025
22
Llama-4-Scout-17B-16E-InstructOpen Source
meta-llama
4.102564Apr 2025
23
Qwen/Qwen3.5-35B-A3B thinking (API)Open Source
Qwen
4.102564Jul 2025
24
Bielik-11B-v2.6-InstructOpen Source
speakleash
4.102564Feb 2025
25
Qwen2.5-72B-InstructOpen Source
Qwen
4.076923Sep 2024
26
Bielik-11B-v2.5-InstructOpen Source
speakleash
4.00641Jan 2025
27
mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source
mistralai
4.00641Jun 2025
28
Meta-Llama-3.1-8B-InstructOpen Source
meta-llama
3.974359Jul 2024
29
Bielik-11B-v2.0-InstructOpen Source
speakleash
3.974359Aug 2024
30
Bielik-11B-v2.3-InstructOpen Source
speakleash
3.974359Nov 2024
31
Bielik-11B-v2.1-InstructOpen Source
speakleash
3.955128Sep 2024
32
openai/gpt-oss-120b (API)Open Source
openai
3.942308Jun 2025
33
Llama-PLLuM-70B-chatOpen Source
CYFRAGOVPL
3.94Mar 2025
34
Qwen/Qwen3-14B non-thinking (API)Open Source
Qwen
3.910256Apr 2025
35
Qwen2.5-14B-InstructOpen Source
Qwen
3.910256Sep 2024
36
Mistral-Small-24B-Instruct-2501Open Source
mistralai
3.910256Jan 2025
37
CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source
CYFRAGOVPL
3.910256Jul 2025
38
PLLuM-8x7B-nc-instructOpen Source
CYFRAGOVPL
3.88Feb 2025
39
Bielik-11B-v3.0-InstructOpen Source
speakleash
3.878205Jun 2025
40
gemma-3-27b-itOpen Source
google
3.878205Mar 2025
41
Qwen2.5-32B-InstructOpen Source
Qwen
3.814103Sep 2024
42
Mixtral-8x22B-Instruct-v0.1Open Source
mistralai
3.782051Apr 2024
43
Llama-PLLuM-70B-instructOpen Source
CYFRAGOVPL
3.78Mar 2025
44
Bielik-4.5B-v3.0-InstructOpen Source
speakleash
3.762821Jun 2025
45
Qwen2-72B-InstructOpen Source
Qwen
3.762821Jun 2024
46
PLLuM-8x7B-nc-chatOpen Source
CYFRAGOVPL
3.76Feb 2025
47
openchat-3.5-0106-gemmaOpen Source
openchat
3.730769Dec 2023
48
Qwen/Qwen3-30B-A3B non-thinking (API)Open Source
Qwen
3.717949Apr 2025
49
speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source
speakleash
3.717949Jul 2025
50
phi-4Open Source
microsoft
3.717949Jan 2025
51
Bielik-11B-v2.2-InstructOpen Source
speakleash
3.717949Oct 2024
52
PLLuM-12B-instructOpen Source
CYFRAGOVPL
3.71Apr 2025
53
WizardLM-2-8x22BOpen Source
alpindale
3.705128Apr 2024
54
Mistral-Nemo-Instruct-2407Open Source
mistralai
3.641026Jul 2024
55
PLLuM-8x7B-instructOpen Source
CYFRAGOVPL
3.59Feb 2025
56
glm-4-9b-chatOpen Source
THUDM
3.589744Jun 2024
57
Bielik-7B-Instruct-v0.1Open Source
speakleash
3.589744Apr 2024
58
Qwen2.5-7B-InstructOpen Source
Qwen
3.557692Sep 2024
59
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source
nvidia
3.525641Jun 2025
60
Bielik-1.5B-v3.0-InstructOpen Source
speakleash
3.525641Jun 2025
61
Qwen/Qwen3-8B non-thinking (API)Open Source
Qwen
3.49359Apr 2025
62
Qwen1.5-72B-ChatOpen Source
Qwen
3.474359Feb 2024
63
PLLuM-8x7B-chatOpen Source
CYFRAGOVPL
3.44Feb 2025
64
gemma-2-2b-itOpen Source
google
3.397436Jun 2024
65
EuroLLM-9B-InstructOpen Source
utter-project
3.365385Mar 2025
66
Meta-Llama-3-8B-InstructOpen Source
meta-llama
3.333333Apr 2024
67
Mistral-7B-Instruct-v0.3Open Source
mistralai
3.326923May 2024
68
PLLuM-12B-chatOpen Source
CYFRAGOVPL
3.32Apr 2025
69
internlm2-chat-20bOpen Source
internlm
3.301282Jan 2024
70
trurl-2-13b-academicOpen Source
Voicelab
3.301282Jan 2024
71
CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source
CYFRAGOVPL
3.24Mar 2025
72
CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source
CYFRAGOVPL
3.24Apr 2025
73
CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source
CYFRAGOVPL
3.22Apr 2025
74
openchat-3.5-0106Open Source
openchat
3.160256Dec 2023
75
Llama-PLLuM-8B-chatOpen Source
CYFRAGOVPL
3.13Mar 2025
76
Llama-3.2-1B-InstructOpen Source
meta-llama
3.076923Sep 2024
77
Yi-1.5-34B-ChatOpen Source
01-ai
3.076923May 2024
78
granite-3.1-2b-instructOpen Source
ibm-granite
3.076923Jan 2025
79
Starling-LM-7B-alphaOpen Source
berkeley-nest
3.057692Nov 2023
80
Mixtral-8x7B-Instruct-v0.1Open Source
mistralai
3.057692Dec 2023
81
Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source
Qwen
3.012821Jul 2025
82
SOLAR-10.7B-Instruct-v1.0Open Source
upstage
2.967949Dec 2023
83
Qwen2.5-3B-InstructOpen Source
Qwen
2.948718Sep 2024
84
Qwen2.5-1.5B-InstructOpen Source
Qwen
2.794872Sep 2024
85
Llama-3.2-3B-InstructOpen Source
meta-llama
2.75641Sep 2024
86
Phi-4-mini-instructOpen Source
microsoft
2.692308Apr 2025
87
NousResearch/Hermes-3-Llama-3.2-3BOpen Source
NousResearch
2.615385Oct 2024
88
Phi-3.5-mini-instructOpen Source
microsoft
2.435897Aug 2024
89
h2oai/h2o-danube2-1.8b-chatOpen Source
h2oai
2.371795Apr 2024
90
SmolLM2-1.7B-InstructOpen Source
HuggingFaceTB
2.275641Feb 2025
91
EuroLLM-1.7B-InstructOpen Source
utter-project
2.24359Jan 2025
92
Qwen/Qwen2.5-0.5B-InstructOpen Source
Qwen
1.955128Sep 2024
93
LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source
LGAI-EXAONE
1.942308Jan 2025

tricky-questions

#ModelScorePaper / CodeDate
1
Qwen/Qwen3.5-35B-A3B thinking (API)Open Source
Qwen
4.702247Jul 2025
2
Qwen/Qwen3.5-27B thinking (API)Open Source
Qwen
4.61236Jul 2025
3
Qwen/Qwen3.5-27B non-thinking (API)Open Source
Qwen
4.426966Jul 2025
4
deepseek-ai/DeepSeek-V3.2 (API)Open Source
deepseek-ai
4.196629Jul 2025
5
Qwen/Qwen3.5-35B-A3B non-thinking (API)Open Source
Qwen
4.191011Jul 2025
6
deepseek-ai/DeepSeek-R1 (API)Open Source
deepseek-ai
4.117978Jan 2025
7
🚧DeepSeek-V3-0324Open Source
deepseek-ai
4.022472Mar 2025
8
deepseek-ai/DeepSeek-V3 (API)Open Source
deepseek-ai
3.988764Dec 2024
9
gemini-2.0-flash-001Open Source
Google
3.988764Feb 2025
10
moonshotai/Kimi-K2-Instruct-0905 (API)Open Source
moonshotai
3.932584Sep 2025
11
openai/gpt-oss-120b (API)Open Source
openai
3.88764Jun 2025
12
deepseek-ai/DeepSeek-V3.1 (API)Open Source
deepseek-ai
3.870787May 2025
13
gemini-2.0-flash-lite-001Open Source
Google
3.853933Feb 2025
14
Qwen/Qwen3-235B-A22B non-thinking (API)Open Source
Qwen
3.837079Apr 2025
15
Qwen2.5-72B-InstructOpen Source
Qwen
3.808989Sep 2024
16
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (API)Open Source
meta-llama
3.758427Apr 2025
17
Mistral-Large-Instruct-2411Open Source
mistralai
3.724719Nov 2024
18
Meta-Llama-3-70B-InstructOpen Source
meta-llama
3.707865Apr 2024
19
Qwen2-72B-InstructOpen Source
Qwen
3.679775Jun 2024
20
Mistral-Large-Instruct-2407Open Source
mistralai
3.646067Jul 2024
21
Qwen/Qwen3.5-9B non-thinking (API, FP8)Open Source
Qwen
3.640449Jul 2025
22
Qwen2.5-32B-InstructOpen Source
Qwen
3.589888Sep 2024
23
Qwen/Qwen3-32B non-thinking (API)Open Source
Qwen
3.561798Apr 2025
24
Qwen/Qwen3-30B-A3B non-thinking (API)Open Source
Qwen
3.544944Apr 2025
25
gemma-3-27b-itOpen Source
google
3.533708Mar 2025
26
Bielik-11B-v2.1-InstructOpen Source
speakleash
3.47191Sep 2024
27
Mistral-Small-24B-Instruct-2501Open Source
mistralai
3.449438Jan 2025
28
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Open Source
nvidia
3.432584Jun 2025
29
mistralai/Mistral-Small-3.1-24B-Instruct-2503 (API FP8)Open Source
mistralai
3.421348Mar 2025
30
Llama-3.3-70B-InstructOpen Source
meta-llama
3.376404Dec 2024
31
Qwen2.5-14B-InstructOpen Source
Qwen
3.337079Sep 2024
32
Qwen/Qwen3-14B non-thinking (API)Open Source
Qwen
3.331461Apr 2025
33
mistralai/Mistral-Small-3.2-24B-Instruct-2506 (API FP8)Open Source
mistralai
3.303371Jun 2025
34
Mixtral-8x22B-Instruct-v0.1Open Source
mistralai
3.235955Apr 2024
35
Bielik-11B-v2.3-InstructOpen Source
speakleash
3.219101Nov 2024
36
Llama-PLLuM-70B-chatOpen Source
CYFRAGOVPL
3.213483Mar 2025
37
Llama-4-Scout-17B-16E-InstructOpen Source
meta-llama
3.191011Apr 2025
38
Bielik-11B-v3.0-InstructOpen Source
speakleash
3.185393Jun 2025
39
Bielik-11B-v2.2-InstructOpen Source
speakleash
3.123596Oct 2024
40
Bielik-11B-v2.6-InstructOpen Source
speakleash
3.095506Feb 2025
41
WizardLM-2-8x22BOpen Source
alpindale
3.05618Apr 2024
42
Meta-Llama-3.1-70B-InstructOpen Source
meta-llama
3.011236Jul 2024
43
Bielik-11B-v2.5-InstructOpen Source
speakleash
2.910112Jan 2025
44
pllum-12b-nc-chat-250715Open Source
CYFRAGOVPL
2.898876Jul 2025
45
Qwen/Qwen3-8B non-thinking (API)Open Source
Qwen
2.764045Apr 2025
46
EuroLLM-9B-InstructOpen Source
utter-project
2.747191Mar 2025
47
speakleash/Bielik-Minitron-7B-v3.0-InstructOpen Source
speakleash
2.735955Jul 2025
48
phi-4Open Source
microsoft
2.724719Jan 2025
49
Qwen1.5-72B-ChatOpen Source
Qwen
2.668539Feb 2024
50
Llama-PLLuM-70B-instructOpen Source
CYFRAGOVPL
2.634831Mar 2025
51
CYFRAGOVPL/PLLuM-12B-nc-chatOpen Source
CYFRAGOVPL
2.623596Apr 2025
52
PLLuM-12B-chatOpen Source
CYFRAGOVPL
2.589888Apr 2025
53
Qwen2.5-7B-InstructOpen Source
Qwen
2.58427Sep 2024
54
Meta-Llama-3-8B-InstructOpen Source
meta-llama
2.477528Apr 2024
55
Bielik-4.5B-v3.0-InstructOpen Source
speakleash
2.455056Jun 2025
56
CYFRAGOVPL/pllum-12b-nc-instruct-250715Open Source
CYFRAGOVPL
2.370787Jul 2025
57
Llama-PLLuM-8B-chatOpen Source
CYFRAGOVPL
2.252809Mar 2025
58
gemma-2-2b-itOpen Source
google
2.213483Jun 2024
59
Bielik-11B-v2.0-InstructOpen Source
speakleash
2.196629Aug 2024
60
Bielik-7B-Instruct-v0.1Open Source
speakleash
2.157303Apr 2024
61
SOLAR-10.7B-Instruct-v1.0Open Source
upstage
2.123596Dec 2023
62
Meta-Llama-3.1-8B-InstructOpen Source
meta-llama
2.11236Jul 2024
63
Mistral-Nemo-Instruct-2407Open Source
mistralai
2.089888Jul 2024
64
Mistral-7B-Instruct-v0.3Open Source
mistralai
1.988764May 2024
65
glm-4-9b-chatOpen Source
THUDM
1.983146Jun 2024
66
CYFRAGOVPL/PLLuM-12B-nc-instructOpen Source
CYFRAGOVPL
1.983146Apr 2025
67
openchat-3.5-0106Open Source
openchat
1.960674Dec 2023
68
PLLuM-12B-instructOpen Source
CYFRAGOVPL
1.904494Apr 2025
69
Qwen2.5-3B-InstructOpen Source
Qwen
1.808989Sep 2024
70
PLLuM-8x7B-nc-chatOpen Source
CYFRAGOVPL
1.797753Feb 2025
71
Mixtral-8x7B-Instruct-v0.1Open Source
mistralai
1.797753Dec 2023
72
PLLuM-8x7B-chatOpen Source
CYFRAGOVPL
1.780899Feb 2025
73
PLLuM-8x7B-nc-instructOpen Source
CYFRAGOVPL
1.764045Feb 2025
74
openchat-3.5-0106-gemmaOpen Source
openchat
1.679775Dec 2023
75
Starling-LM-7B-alphaOpen Source
berkeley-nest
1.679775Nov 2023
76
CYFRAGOVPL/Llama-PLLuM-8B-instructOpen Source
CYFRAGOVPL
1.662921Mar 2025
77
PLLuM-8x7B-instructOpen Source
CYFRAGOVPL
1.505618Feb 2025
78
Phi-4-mini-instructOpen Source
microsoft
1.303371Apr 2025
79
Bielik-1.5B-v3.0-InstructOpen Source
speakleash
1.219101Jun 2025
80
Llama-3.2-3B-InstructOpen Source
meta-llama
1.219101Sep 2024
81
NousResearch/Hermes-3-Llama-3.2-3BOpen Source
NousResearch
1.140449Oct 2024
82
Phi-3.5-mini-instructOpen Source
microsoft
1.044944Aug 2024
83
trurl-2-13b-academicOpen Source
Voicelab
1.016854Jan 2024
84
Yi-1.5-34B-ChatOpen Source
01-ai
1May 2024
85
EuroLLM-1.7B-InstructOpen Source
utter-project
0.758Jan 2025
86
Qwen2.5-1.5B-InstructOpen Source
Qwen
0.663Sep 2024
87
granite-3.1-2b-instructOpen Source
ibm-granite
0.590Jan 2025
88
Llama-3.2-1B-InstructOpen Source
meta-llama
0.522Sep 2024
89
LGAI-EXAONE/EXAONE-3.5-2.4B-InstructOpen Source
LGAI-EXAONE
0.489Jan 2025
90
SmolLM2-1.7B-InstructOpen Source
HuggingFaceTB
0.253Feb 2025
91
Qwen/Qwen2.5-0.5B-InstructOpen Source
Qwen
0.219Sep 2024
92
h2oai/h2o-danube2-1.8b-chatOpen Source
h2oai
0.129Apr 2024
93
internlm2-chat-20bOpen Source
internlm
0.124Jan 2024

Polish Models

Bielik (SpeakLeash) and PLLuM (CYFRA GOV PL) — models built specifically for Polish language.

#ModelAverage
1
Bielik-11B-v3.0-Instructspeakleash
3.73
2
pllum-12b-nc-chat-250715CYFRAGOVPL
3.67
3
Bielik-11B-v2.6-Instructspeakleash
3.64
4
Bielik-11B-v2.3-Instructspeakleash
3.63
5
Bielik-11B-v2.1-Instructspeakleash
3.61
6
Llama-PLLuM-70B-chatCYFRAGOVPL
3.53
7
Bielik-11B-v2.5-Instructspeakleash
3.48
8
Bielik-11B-v2.2-Instructspeakleash
3.46
9
speakleash/Bielik-Minitron-7B-v3.0-Instructspeakleash
3.38
10
Bielik-4.5B-v3.0-Instructspeakleash
3.38
11
Llama-PLLuM-70B-instructCYFRAGOVPL
3.33
12
CYFRAGOVPL/pllum-12b-nc-instruct-250715CYFRAGOVPL
3.33
13
Bielik-11B-v2.0-Instructspeakleash
3.26
14
CYFRAGOVPL/PLLuM-12B-nc-chatCYFRAGOVPL
3.15
15
PLLuM-12B-chatCYFRAGOVPL
3.14
16
PLLuM-8x7B-nc-instructCYFRAGOVPL
3.11
17
PLLuM-12B-instructCYFRAGOVPL
3.09
18
PLLuM-8x7B-nc-chatCYFRAGOVPL
3.03
19
PLLuM-8x7B-instructCYFRAGOVPL
3.01
20
PLLuM-8x7B-chatCYFRAGOVPL
3.01
21
CYFRAGOVPL/PLLuM-12B-nc-instructCYFRAGOVPL
2.96
22
Llama-PLLuM-8B-chatCYFRAGOVPL
2.92
23
Bielik-7B-Instruct-v0.1speakleash
2.88
24
CYFRAGOVPL/Llama-PLLuM-8B-instructCYFRAGOVPL
2.82
25
Bielik-1.5B-v3.0-Instructspeakleash
2.36

Bielik (SpeakLeash)PLLuM (CYFRA GOV PL)Global SOTA: 4.34

Polish Model Evolution

Tracking Bielik and PLLuM performance on CPTU-Bench over time

Average Score Over Versions

BielikPLLuMSOTA (Qwen3.5-27B)

Bielik Versions

7Bv0.1

Bielik-7B-Instruct-v0.1

Apr 2024

2.88
Sentiment
3.59
Language Understanding
3.48
Phraseology
2.31
Tricky Questions
2.16
11Bv2.0

Bielik-11B-v2.0-Instruct

Aug 2024

3.26
+0.38
Sentiment
3.97
Language Understanding
3.75
Phraseology
3.13
Tricky Questions
2.20
11Bv2.1

Bielik-11B-v2.1-Instruct

Sep 2024

3.61
+0.35
Sentiment
3.96
Language Understanding
3.92
Phraseology
3.10
Tricky Questions
3.47
11Bv2.2

Bielik-11B-v2.2-Instruct

Oct 2024

3.46
-0.16
Sentiment
3.72
Language Understanding
3.73
Phraseology
3.25
Tricky Questions
3.12
11Bv2.3

Bielik-11B-v2.3-Instruct

Nov 2024

3.63
+0.18
Sentiment
3.97
Language Understanding
3.79
Phraseology
3.55
Tricky Questions
3.22
11Bv2.5

Bielik-11B-v2.5-Instruct

Jan 2025

3.48
-0.16
Sentiment
4.01
Language Understanding
3.86
Phraseology
3.13
Tricky Questions
2.91
11Bv2.6

Bielik-11B-v2.6-Instruct

Feb 2025

3.64
+0.16
Sentiment
4.10
Language Understanding
3.94
Phraseology
3.41
Tricky Questions
3.10
1.5Bv3.0

Bielik-1.5B-v3.0-Instruct

Jun 2025

2.36
-1.27
Sentiment
3.53
Language Understanding
2.33
Phraseology
2.38
Tricky Questions
1.22
11Bv3.0

Bielik-11B-v3.0-Instruct

Jun 2025

3.73
+1.37
Sentiment
3.88
Language Understanding
3.91
Phraseology
3.96
Tricky Questions
3.19
4.5Bv3.0

Bielik-4.5B-v3.0-Instruct

Jun 2025

3.38
-0.36
Sentiment
3.76
Language Understanding
3.61
Phraseology
3.67
Tricky Questions
2.46
7Bv3.0

speakleash/Bielik-Minitron-7B-v3.0-Instruct

Jul 2025

3.38
+0.00
Sentiment
3.72
Language Understanding
3.83
Phraseology
3.23
Tricky Questions
2.74

PLLuM Versions

7B?

PLLuM-8x7B-chat

Feb 2025

3.01
Sentiment
3.44
Language Understanding
3.45
Phraseology
3.35
Tricky Questions
1.78
7B?

PLLuM-8x7B-instruct

Feb 2025

3.01
+0.00
Sentiment
3.59
Language Understanding
3.47
Phraseology
3.46
Tricky Questions
1.51
7B?

PLLuM-8x7B-nc-chat

Feb 2025

3.03
+0.02
Sentiment
3.76
Language Understanding
3.48
Phraseology
3.08
Tricky Questions
1.80
7B?

PLLuM-8x7B-nc-instruct

Feb 2025

3.11
+0.08
Sentiment
3.88
Language Understanding
3.59
Phraseology
3.22
Tricky Questions
1.76
8B?

CYFRAGOVPL/Llama-PLLuM-8B-instruct

Mar 2025

2.82
-0.30
Sentiment
3.24
Language Understanding
2.90
Phraseology
3.46
Tricky Questions
1.66
70B?

Llama-PLLuM-70B-chat

Mar 2025

3.53
+0.71
Sentiment
3.94
Language Understanding
3.61
Phraseology
3.35
Tricky Questions
3.21
70B?

Llama-PLLuM-70B-instruct

Mar 2025

3.33
-0.20
Sentiment
3.78
Language Understanding
3.63
Phraseology
3.26
Tricky Questions
2.63
8B?

Llama-PLLuM-8B-chat

Mar 2025

2.92
-0.41
Sentiment
3.13
Language Understanding
2.93
Phraseology
3.36
Tricky Questions
2.25
12B?

CYFRAGOVPL/PLLuM-12B-nc-chat

Apr 2025

3.15
+0.24
Sentiment
3.22
Language Understanding
3.23
Phraseology
3.54
Tricky Questions
2.62
12B?

CYFRAGOVPL/PLLuM-12B-nc-instruct

Apr 2025

2.96
-0.19
Sentiment
3.24
Language Understanding
3.31
Phraseology
3.32
Tricky Questions
1.98
12B?

PLLuM-12B-chat

Apr 2025

3.14
+0.17
Sentiment
3.32
Language Understanding
3.21
Phraseology
3.43
Tricky Questions
2.59
12B?

PLLuM-12B-instruct

Apr 2025

3.09
-0.04
Sentiment
3.71
Language Understanding
3.17
Phraseology
3.59
Tricky Questions
1.90
??

CYFRAGOVPL/pllum-12b-nc-instruct-250715

Jul 2025

3.33
+0.23
Sentiment
3.91
Language Understanding
3.73
Phraseology
3.29
Tricky Questions
2.37
??

pllum-12b-nc-chat-250715

Jul 2025

3.67
+0.34
Sentiment
4.36
Language Understanding
3.96
Phraseology
3.46
Tricky Questions
2.90

Frontier Comparison

Bielik-11B-v3.0-Instruct v3.03.73
Qwen3.5-27B4.34

Best Bielik is 13.9% behind current SOTA

Benchmark Authors

speakleash
JS
Jan SowaLeadership, writing, code
NN
Natalia NadolnaCode, dataset cleaning
AZ
Anna ZielińskaCode, dataset analysis
AK
Agnieszka KosiakWriting texts
MK
Magdalena KrawczykWriting, labeling
MMK
Marta M. KaniaPrompt engineering
WW
Wiktoria WierzbińskaWriting texts
RK
Remigiusz KinasMethodology
KW
Krzysztof WróbelEngineering, methodology
AS
MF
Maria FilipkowskaWriting, linguistics
MK
AG