Text ranking is the invisible backbone of every search engine and RAG pipeline. The field was transformed by ColBERT (2020) introducing late interaction, then by instruction-tuned embedding models like E5-Mistral and GTE-Qwen that turned general LLMs into retrieval engines. MS MARCO and BEIR remain the standard battlegrounds, but the real test is zero-shot transfer — can a model trained on web search generalize to legal documents, scientific papers, and code? The gap between supervised and zero-shot performance has shrunk from 15+ points to under 3 in two years.
Heterogeneous information retrieval benchmark across 18 datasets
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
2 datasets tracked for this task.
Still looking for something on Text Ranking? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.