Codesota · Tasks · Natural Language InferenceHome/Tasks/Natural Language Processing/Natural Language Inference

Natural Language Inference.

Determining entailment relationships between sentences (SNLI, MNLI).

Datasets

Results

accuracy

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

SNLI

570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral.

Primary metric: accuracy

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on SNLI.

#	Model	accuracy	Year	Source
★	GPT-4o✓	92.6	2026	paper ↗
2	DeBERTa-v3-large✓	92.2	2026	paper ↗
3	Gemini Ultra✓	91.9	2026	paper ↗
4	Claude 3.5 Sonnet✓	91.8	2026	paper ↗
5	Llama 3.1 405B✓	91.2	2026	paper ↗
6	Qwen2 72B✓	90.1	2026	paper ↗
7	Llama 3 70B✓	89.7	2026	paper ↗
8	Mistral 7B✓	85.6	2026	paper ↗

What were you looking for on Natural Language Inference?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

§ 05 · Related tasks

Other tasks in Natural Language Processing.

Feature Extraction Fill-Mask Named Entity Recognition Polish Conversation Quality Polish Cultural Competency Polish Emotional Intelligence Polish LLM General Polish Text Understanding

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Natural Language Inference? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.