Hallucination Detection
Score or flag generated text for factuality and grounding.
How Hallucination Detection Works
A technical deep-dive into LLM hallucinations: what they are, why they happen, and how to detect and mitigate them. Essential knowledge for production AI systems.
What is Hallucination?
When an LLM generates content that is factually incorrect, unsupported by context, or entirely fabricated - yet presents it with the same confidence as accurate information.
LLMs do not "know" things the way humans do. They are statistical pattern matchers trained on text. When the patterns suggest a plausible-sounding completion, the model generates it - regardless of factual accuracy. The model has no internal fact-checker, no way to say "I am making this up."
Two Dimensions of Correctness
Does the output align with real-world facts and knowledge?
Does the output accurately reflect the provided context/source document?
For RAG systems and document-grounded tasks, you have the source text. This makes verification tractable: check if each claim in the output is supported by the source. Factual verification is harder because you need access to ground truth knowledge, which may be vast or unavailable.
Types of Hallucinations
Not all hallucinations are equal. Understanding the taxonomy helps choose the right detection method.
The model states something factually incorrect that contradicts real-world knowledge
The model's output contradicts or is not supported by the provided context/source
The output directly contradicts the source material provided in the prompt
The output includes information that cannot be verified from the source (may or may not be true)
Intrinsic vs Extrinsic Hallucinations
Interactive Demo: Spot the Hallucinations
Compare a faithful response to one with hallucinations. Hover over highlighted text to see details.
The company reported exceptional Q3 2024 results with revenue of $52 million, a 15% increase from last year. Net income was $8.1 million. The company, founded in 2015, now employs over 500 people at its San Francisco headquarters. They launched two new product lines and expanded to 5 European countries. CEO John Martinez expressed optimism about future growth.
Detection Methods
Five approaches to detecting hallucinations, each with different tradeoffs.
For each claim in output, classify as ENTAILMENT, NEUTRAL, or CONTRADICTION against sourceGenerate N responses, measure agreement. Factual content is consistent; hallucinations vary.Extract claims -> retrieve evidence -> verify each claim against evidenceDecompose answer into claims, verify each against context using LLM-as-judgeDecompose -> verify each atomic fact -> aggregate scores- - You have source documents (RAG)
- - Need interpretable results
- - Want to avoid LLM API costs
- - No external knowledge available
- - Checking factual knowledge
- - Latency is not critical
- - Evaluating RAG pipelines
- - Need multiple metrics
- - Production monitoring
Benchmarks and Evaluation
Standard datasets and metrics for measuring hallucination rates.
| Benchmark | Type | Size | Metric | Focus |
|---|---|---|---|---|
| TruthfulQA | Factuality | 817 questions | % Truthful & Informative | Tests resistance to common misconceptions |
| HaluEval | RAG Hallucination | 35K samples | Detection Accuracy | Comprehensive hallucination benchmark |
| FActScore | Biography Generation | 500+ bios | % Supported Facts | Fine-grained factuality |
| FEVER | Fact Verification | 185K claims | Label Accuracy | Evidence-based verification |
| SummEval | Summarization | 2800 summaries | Faithfulness Score | Summarization hallucinations |
Mitigation Strategies
Ground LLM responses in retrieved documents to reduce hallucination
Ask model to reason step-by-step, making errors more visible
Sample multiple responses and use majority vote or consistency filtering
Train or prompt model to express uncertainty when unsure
Force model to cite sources for each claim
Make model aware of its training date and knowledge limits
Code Examples
Implementations of each detection method. Start with NLI for RAG systems or SelfCheck for general use.
from transformers import pipeline
# Load NLI model for entailment checking
nli = pipeline("text-classification", model="facebook/bart-large-mnli")
def check_faithfulness(source: str, claim: str) -> dict:
"""
Check if a claim is entailed by the source document.
Returns: {"label": "ENTAILMENT"|"NEUTRAL"|"CONTRADICTION", "score": float}
"""
# NLI format: premise (source) -> hypothesis (claim)
result = nli(f"{source}</s></s>{claim}")
return result[0]
# Example usage
source = """The company reported Q3 revenue of $45.2 million,
representing a 12% increase year-over-year."""
claims = [
"Revenue was $45.2 million.", # Should be ENTAILMENT
"Revenue increased by 12% YoY.", # Should be ENTAILMENT
"Revenue was $52 million.", # Should be CONTRADICTION
"The CEO is optimistic about growth.", # Should be NEUTRAL (not stated)
]
for claim in claims:
result = check_faithfulness(source, claim)
print(f"Claim: {claim}")
print(f" -> {result['label']} ({result['score']:.3f})")
# Output:
# Claim: Revenue was $45.2 million.
# -> ENTAILMENT (0.943)
# Claim: Revenue increased by 12% YoY.
# -> ENTAILMENT (0.891)
# Claim: Revenue was $52 million.
# -> CONTRADICTION (0.967)
# Claim: The CEO is optimistic about growth.
# -> NEUTRAL (0.824)Quick Reference
- - Factual: contradicts world knowledge
- - Faithfulness: contradicts source
- - Intrinsic: direct contradiction
- - Extrinsic: unverifiable addition
- - NLI: entailment checking
- - SelfCheck: consistency sampling
- - RAGAS: RAG evaluation
- - FActScore: atomic fact verification
- - RAG: ground in documents
- - Citations: force attribution
- - Self-consistency: majority vote
- - Uncertainty: express confidence
- 1. Hallucinations are LLM pattern completion without fact-checking
- 2. Faithfulness (to source) is easier to verify than factuality (world knowledge)
- 3. NLI-based detection works well for RAG and document-grounded tasks
- 4. No single method catches all hallucinations - combine approaches for production
Use Cases
- ✓RAG answer validation
- ✓Safety review
- ✓Content QA
- ✓Model evaluation
Architectural Patterns
Retrieval Entailment
Compare generation against retrieved evidence with NLI.
Self-Check Ensembles
Ask multiple models/queries and vote on consistency.
Implementations
Benchmarks
Quick Facts
- Input
- Text
- Output
- Structured Data
- Implementations
- 3 open source, 0 API
- Patterns
- 2 approaches