Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Natural Language InferenceHome/Tasks/Natural Language Processing/Natural Language Inference

Natural Language Inference.

Determining entailment relationships between sentences (SNLI, MNLI).

1
Datasets
8
Results
accuracy
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

SNLI

570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral.

Primary metric: accuracy
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on SNLI.

#ModelaccuracyYearSource
GPT-4o92.62026paper ↗
2DeBERTa-v3-large92.22026paper ↗
3Gemini Ultra91.92026paper ↗
4Claude 3.5 Sonnet91.82026paper ↗
5Llama 3.1 405B91.22026paper ↗
6Qwen2 72B90.12026paper ↗
7Llama 3 70B89.72026paper ↗
8Mistral 7B85.62026paper ↗

What were you looking for on Natural Language Inference?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

SNLI
CANONICAL
8 results · accuracy
Top: GPT-4o 92.6
§ 05 · Related tasks

Other tasks in Natural Language Processing.

Feature ExtractionFill-MaskNamed Entity RecognitionPolish Conversation QualityPolish Cultural CompetencyPolish Emotional IntelligencePolish LLM GeneralPolish Text Understanding
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Natural Language Inference? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.