Evaluating language models on emotional intelligence in Polish: understanding emotional states, predicting emotional responses, and nuanced sentiment analysis.
Evaluates LLMs on emotional intelligence in Polish. Based on EQ-Bench v2 methodology adapted for Polish language. Models predict emotional intensity changes across 171 questions. Score adjusted for parseability: Benchmark Score × (Parseable / 171). Created by SpeakLeash.
Leading models on Polish EQ-Bench.
| # | Model | eq-score | Year | Source |
|---|---|---|---|---|
| ★ | Mistral-Large-Instruct-2407✓ | 78.1 | 2026 | paper ↗ |
| 2 | Mistral-Large-Instruct-2411✓ | 77.3 | 2026 | paper ↗ |
| 3 | Meta-Llama-3.1-405B-Instruct-FP8✓ | 77.2 | 2026 | paper ↗ |
| 4 | GPT-4o-2024-08-06✓ | 75.2 | 2026 | paper ↗ |
| 5 | gpt-4-turbo-2024-04-09✓ | 74.6 | 2026 | paper ↗ |
| 6 | Bielik-11B-v2.6-Instruct✓ | 73.7 | 2026 | paper ↗ |
| 7 | 🚧DeepSeek-V3-0324✓ | 73.5 | 2026 | paper ↗ |
| 8 | Mistral-Small-Instruct-2409✓ | 72.8 | 2026 | paper ↗ |
| 9 | Llama-PLLuM-70B-chat✓ | 72.6 | 2026 | paper ↗ |
| 10 | Meta-Llama-3.1-70B-Instruct✓ | 72.5 | 2026 | paper ↗ |
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
Still looking for something on Polish Emotional Intelligence? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.