Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Text RankingHome/Tasks/Natural Language Processing/Text Ranking

Text Ranking.

Text ranking is the invisible backbone of every search engine and RAG pipeline. The field was transformed by ColBERT (2020) introducing late interaction, then by instruction-tuned embedding models like E5-Mistral and GTE-Qwen that turned general LLMs into retrieval engines. MS MARCO and BEIR remain the standard battlegrounds, but the real test is zero-shot transfer — can a model trained on web search generalize to legal documents, scientific papers, and code? The gap between supervised and zero-shot performance has shrunk from 15+ points to under 3 in two years.

2
Datasets
9
Results
ndcg
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

BEIR

Heterogeneous information retrieval benchmark across 18 datasets

Primary metric: ndcg
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on BEIR.

#Modelndcg@10YearSource
NV-Embed-v262.62024paper ↗
2GTE-Qwen2-7B-instruct60.32024paper ↗
3E5-Mistral-7B-instruct56.92024paper ↗
4ColBERTv249.42022paper ↗
5ModernBERT (large)44.02024paper ↗

What were you looking for on Text Ranking?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

2 datasets tracked for this task.

BEIR
CANONICAL
5 results · ndcg
Top: NV-Embed-v2 62.6
MS MARCO
4 results · mrr
Top: RankLLaMA-7B 41.8
§ 05 · Related tasks

Other tasks in Natural Language Processing.

Feature ExtractionFill-MaskNamed Entity RecognitionNatural Language InferencePolish Conversation QualityPolish Cultural CompetencyPolish Emotional IntelligencePolish LLM General
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Text Ranking? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.