Text Ranking.

Text ranking is the invisible backbone of every search engine and RAG pipeline. The field was transformed by ColBERT (2020) introducing late interaction, then by instruction-tuned embedding models like E5-Mistral and GTE-Qwen that turned general LLMs into retrieval engines. MS MARCO and BEIR remain the standard battlegrounds, but the real test is zero-shot transfer — can a model trained on web search generalize to legal documents, scientific papers, and code? The gap between supervised and zero-shot performance has shrunk from 15+ points to under 3 in two years.

Datasets

Results

ndcg

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

BEIR

Heterogeneous information retrieval benchmark across 18 datasets

Primary metric: ndcg

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on BEIR.

#	Model	ndcg@10	Year	Source
★	NV-Embed-v2✓	62.6	2024	paper ↗
2	GTE-Qwen2-7B-instruct✓	60.3	2024	paper ↗
3	E5-Mistral-7B-instruct✓	56.9	2024	paper ↗
4	ColBERTv2✓	49.4	2022	paper ↗
5	ModernBERT (large)	44.0	2024	paper ↗