Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Semantic Textual SimilarityHome/Tasks/Natural Language Processing/Semantic Textual Similarity
Natural Language Processing· sentence-similarity

Semantic Textual Similarity.

Semantic similarity measures how close two pieces of text are in meaning — the foundation of duplicate detection, paraphrase mining, and retrieval. STS Benchmark scores climbed from 70 (GloVe averages) to 86+ with Sentence-BERT, and now exceed 92 with models like GTE-Qwen2 and E5-Mistral that leverage billion-parameter backbones. The real shift was from symmetric similarity (are these two sentences paraphrases?) to asymmetric retrieval (does this passage answer this query?), driven by the RAG revolution that made embedding quality a production-critical metric. Cross-lingual semantic similarity remains a hard frontier — models trained primarily on English still lose 5-10 points when comparing sentences across language families, despite multilingual pretraining.

1
Datasets
3
Results
spearman
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

STS Benchmark

Semantic textual similarity with human-annotated sentence pairs

Primary metric: spearman
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on STS Benchmark.

#ModelspearmanYearSource
GTE-Qwen2-7B-instruct88.42024paper ↗
2E5-Mistral-7B-instruct84.72024paper ↗
3all-MiniLM-L6-v282.82022paper ↗

What were you looking for on Semantic Textual Similarity?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

STS Benchmark
CANONICAL
3 results · spearman
Top: GTE-Qwen2-7B-instruct 88.4
§ 05 · Related tasks

Other tasks in Natural Language Processing.

Feature ExtractionFill-MaskNamed Entity RecognitionNatural Language InferencePolish Conversation QualityPolish Cultural CompetencyPolish Emotional IntelligencePolish LLM General
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Semantic Textual Similarity? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.