Codesota · Models · GPT-3.5-turboOpenAI17 results · 3 benchmarks
Model card

GPT-3.5-turbo.

OpenAIopen-source
§ 01 · Benchmarks

Every benchmark GPT-3.5-turbo has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01Polish MT-BenchNatural Language Processing · Polish Conversation Qualityhumanities9.8%#9/50source ↗
02Polish MT-BenchNatural Language Processing · Polish Conversation Qualitywriting9.1%#14/50source ↗
03Polish MT-BenchNatural Language Processing · Polish Conversation Qualitymath6.8%#14/50source ↗
04Polish MT-BenchNatural Language Processing · Polish Conversation Qualitycoding6.0%#16/50source ↗
05Polish MT-BenchNatural Language Processing · Polish Conversation Qualitystem9.3%#17/50source ↗
06Polish MT-BenchNatural Language Processing · Polish Conversation Qualityroleplay8.7%#20/50source ↗
07Polish MT-BenchNatural Language Processing · Polish Conversation Qualitypl-score7.7%#21/50source ↗
08Polish MT-BenchNatural Language Processing · Polish Conversation Qualityextraction8.2%#26/50source ↗
09Polish MT-BenchNatural Language Processing · Polish Conversation Qualityreasoning5.2%#27/50source ↗
10Polish EQ-BenchNatural Language Processing · Polish Emotional Intelligenceeq-score57.7%#37/101source ↗
11PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment39.0%#117/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencygeography55.0%#121/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencyaverage43.3%#128/165source ↗
14PLCCNatural Language Processing · Polish Cultural Competencyvocabulary36.0%#132/165source ↗
15PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition38.0%#133/165source ↗
16PLCCNatural Language Processing · Polish Cultural Competencyhistory51.0%#137/165source ↗
17PLCCNatural Language Processing · Polish Cultural Competencygrammar41.0%#144/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where GPT-3.5-turbo actually performs.

Natural Language Processing
3
benchmarks
avg rank #65.5
§ 04 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 35 results · 9 SOTA
o3
16 results · 5 SOTA
o4-mini
13 results · 3 SOTA
o3 (high)
2 results · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
11 results
GPT-5
8 results
o1-preview
Undisclosed params · 8 results
§ 05 · Sources & freshness

Where these numbers come from.

SpeakLeash/MT-Bench-PL
9
results
sdadas/PLCC
7
results
SpeakLeash/Polish-EQ-Bench
1
result
17 of 17 rows marked verified.