Codesota · Language & TextWhich model, what task, at what costIssue: March 2026

§ 00 · Language & text

Text task router

Pick the text output you need: answer, vector, label, entities, translation, or summary. LLM leaderboards are only one slice of the language stack.

Use `/llm` for frontier reasoning, `/benchmarks/mteb` for embeddings, and the task rows below for specialised NLP work.

Browse text tasks →Frontier LLMs MTEB embeddings

§ 01 · Text tasks

Not every task needs an LLM.

Six text-processing axes where specialised models still compete — or win outright — on latency, cost, or accuracy at scale.

Text Embeddings →

Semantic search, RAG, clustering

KaLM-Gemma3-12B (72.3%)

Translation →

33+ languages, document-level

HY-MT1.5 (WMT2025 winner)

Question Answering →

Extractive, abstractive, multi-hop

SQuAD, TriviaQA

GPT-5 / Claude 4

Named Entity Recognition →

People, orgs, locations, custom

Fine-tuned DeBERTa v3

Text Classification →

Sentiment, intent, topic

GLUE, SuperGLUE

DeBERTa v3 (GLUE 91.3)

Summarization →

News, documents, conversations

Claude 4 / GPT-5

§ 02 · Decision

LLM, or specialised model?

Use an LLM when

·Few examples available (few-shot)
·Complex, nuanced task definitions
·You need to explain reasoning
·The task evolves frequently
·Low volume (< 10K requests/day)

Use a specialised model when

·High volume (> 100K requests/day)
·Latency critical (< 100ms)
·Cost sensitive (pennies per 1K calls)
·Well-defined, stable task
·Training data available

§ 03 · Keep reading

Go deeper.

Verified benchmarks across every text task. Submit new SOTA results or suggest benchmarks we should be tracking.

Frontier leaderboard →MTEB embedding benchmark All NLP benchmarks