Home/Building Blocks/Hallucination Detection
TextStructured Data

Hallucination Detection

Score or flag generated text for factuality and grounding.

How Hallucination Detection Works

A technical deep-dive into LLM hallucinations: what they are, why they happen, and how to detect and mitigate them. Essential knowledge for production AI systems.

1

What is Hallucination?

When an LLM generates content that is factually incorrect, unsupported by context, or entirely fabricated - yet presents it with the same confidence as accurate information.

The Core Problem

LLMs do not "know" things the way humans do. They are statistical pattern matchers trained on text. When the patterns suggest a plausible-sounding completion, the model generates it - regardless of factual accuracy. The model has no internal fact-checker, no way to say "I am making this up."

Why This Matters
Hallucinations are indistinguishable from accurate responses without verification. In high-stakes domains (medical, legal, financial), this is dangerous.
Root Cause
LLMs optimize for plausible next-token prediction, not truth. Training on internet text includes errors, contradictions, and outdated information.

Two Dimensions of Correctness

Factual Correctness

Does the output align with real-world facts and knowledge?

Question:
"What is the capital of France?"
Correct: "Paris"
Hallucination: "Lyon"
Verification requires external knowledge (world facts, databases, documents)
Faithfulness to Source

Does the output accurately reflect the provided context/source document?

Context: "The meeting is at 3pm"
"When is the meeting?"
Faithful: "3pm"
Hallucination: "2pm"
Verification only requires the source document (RAG, summarization tasks)
Key Insight: Faithfulness is Easier to Verify

For RAG systems and document-grounded tasks, you have the source text. This makes verification tractable: check if each claim in the output is supported by the source. Factual verification is harder because you need access to ground truth knowledge, which may be vast or unavailable.

2

Types of Hallucinations

Not all hallucinations are equal. Understanding the taxonomy helps choose the right detection method.

Factual Hallucination

The model states something factually incorrect that contradicts real-world knowledge

High Severity
Prompt: "When did the Eiffel Tower open?"
Output: "The Eiffel Tower opened in 1921 to celebrate France's victory in World War I."
Issue: The Eiffel Tower opened in 1889 for the World's Fair, not 1921.
Faithfulness Hallucination

The model's output contradicts or is not supported by the provided context/source

High Severity
Context: "The agreement specifies a total payment of $50,000."
Prompt: "Based on the document, what was the contract value?"
Output: "According to the document, the contract value was $75,000."
Issue: Output contradicts the source document (50K vs 75K).
Intrinsic Hallucination

The output directly contradicts the source material provided in the prompt

High Severity
Prompt: "Summarize: 'The study found no significant correlation between X and Y.'"
Output: "The study demonstrated a strong correlation between X and Y."
Issue: Direct contradiction of the source text.
Extrinsic Hallucination

The output includes information that cannot be verified from the source (may or may not be true)

Medium Severity
Prompt: "Summarize the article about climate change."
Output: "The article discusses rising temperatures. The author, Dr. Smith, has won multiple awards for this research."
Issue: The awards claim cannot be verified from the source.

Intrinsic vs Extrinsic Hallucinations

Intrinsic
Output directly contradicts the source. Always wrong.
Example: Source says "founded in 2018", output says "founded in 2015"
Extrinsic
Output adds information not in source. May or may not be true.
Example: Source does not mention CEO, output names "CEO John Smith"
3

Interactive Demo: Spot the Hallucinations

Compare a faithful response to one with hallucinations. Hover over highlighted text to see details.

Source Document
COMPANY REPORT Q3 2024 Revenue: $45.2 million (up 12% YoY) Net Income: $8.1 million Employees: 342 Headquarters: Austin, Texas Founded: 2018 Key Highlights: - Launched new product line in September - Expanded to 3 new markets in Europe - Customer satisfaction score: 4.2/5.0
Question: Summarize this company's Q3 2024 performance.
Hallucinated Response
ContradictionExtrinsic

The company reported exceptional Q3 2024 results with revenue of $52 million, a 15% increase from last year. Net income was $8.1 million. The company, founded in 2015, now employs over 500 people at its San Francisco headquarters. They launched two new product lines and expanded to 5 European countries. CEO John Martinez expressed optimism about future growth.

Contradicts source
Cannot verify (extrinsic)
Hover over highlights for details
Hallucination Analysis
7
Contradictions
1
Extrinsic Claims
1
Supported Claims
4

Detection Methods

Five approaches to detecting hallucinations, each with different tradeoffs.

1
NLI-based Detection
Use Natural Language Inference models to check if claims in the output are entailed by the source
Medium Complexity
Mechanism:
For each claim in output, classify as ENTAILMENT, NEUTRAL, or CONTRADICTION against source
Pros
+ Interpretable results
+ Works with any LLM output
+ No LLM queries needed
Cons
- Requires claim decomposition
- NLI models have limits
- May miss nuanced contradictions
2
SelfCheckGPT
Sample multiple responses and check for consistency - hallucinations tend to be inconsistent across samples
Low Complexity
Mechanism:
Generate N responses, measure agreement. Factual content is consistent; hallucinations vary.
Pros
+ No external knowledge needed
+ Works for any domain
+ Detects confident hallucinations
Cons
- Requires multiple API calls
- Slower and more expensive
- Consistent hallucinations slip through
3
Retrieval Grounding
Retrieve evidence from knowledge base or web and verify claims against retrieved documents
High Complexity
Mechanism:
Extract claims -> retrieve evidence -> verify each claim against evidence
Pros
+ External ground truth
+ Catches factual errors
+ Scalable to large KBs
Cons
- Requires knowledge base
- Retrieval quality matters
- May not cover all claims
4
RAGAS Evaluation
Comprehensive RAG evaluation framework measuring faithfulness, relevance, and groundedness
Medium Complexity
Mechanism:
Decompose answer into claims, verify each against context using LLM-as-judge
Pros
+ Production-ready framework
+ Multiple metrics
+ Well-documented
Cons
- Requires LLM for evaluation
- Cost at scale
- LLM judge has biases
5
Atomic Fact Verification
Break output into atomic facts and verify each independently against knowledge source
High Complexity
Mechanism:
Decompose -> verify each atomic fact -> aggregate scores
Pros
+ Fine-grained analysis
+ Quantifiable scores
+ Academic rigor
Cons
- Decomposition is hard
- Expensive at scale
- Atomic fact definition varies
Use NLI when:
  • - You have source documents (RAG)
  • - Need interpretable results
  • - Want to avoid LLM API costs
Use SelfCheck when:
  • - No external knowledge available
  • - Checking factual knowledge
  • - Latency is not critical
Use RAGAS when:
  • - Evaluating RAG pipelines
  • - Need multiple metrics
  • - Production monitoring
5

Benchmarks and Evaluation

Standard datasets and metrics for measuring hallucination rates.

BenchmarkTypeSizeMetricFocus
TruthfulQAFactuality817 questions% Truthful & InformativeTests resistance to common misconceptions
HaluEvalRAG Hallucination35K samplesDetection AccuracyComprehensive hallucination benchmark
FActScoreBiography Generation500+ bios% Supported FactsFine-grained factuality
FEVERFact Verification185K claimsLabel AccuracyEvidence-based verification
SummEvalSummarization2800 summariesFaithfulness ScoreSummarization hallucinations

Mitigation Strategies

Retrieval Augmentation (RAG)

Ground LLM responses in retrieved documents to reduce hallucination

Effectiveness: High for knowledge-grounded tasks
Tradeoff: Adds latency and retrieval complexity
Chain-of-Thought Prompting

Ask model to reason step-by-step, making errors more visible

Effectiveness: Moderate - helps with reasoning errors
Tradeoff: Longer outputs, may still hallucinate confidently
Self-Consistency Decoding

Sample multiple responses and use majority vote or consistency filtering

Effectiveness: High for knowledge-based questions
Tradeoff: Higher latency and cost (multiple generations)
Calibration / Uncertainty

Train or prompt model to express uncertainty when unsure

Effectiveness: Moderate - depends on model calibration
Tradeoff: May be overly cautious
Citation Requirements

Force model to cite sources for each claim

Effectiveness: High for verifiable claims
Tradeoff: Requires post-hoc verification of citations
Knowledge Cutoff Awareness

Make model aware of its training date and knowledge limits

Effectiveness: Moderate for temporal facts
Tradeoff: Does not help with persistent misconceptions
6

Code Examples

Implementations of each detection method. Start with NLI for RAG systems or SelfCheck for general use.

NLI Detectionpip install transformers
Interpretable
from transformers import pipeline

# Load NLI model for entailment checking
nli = pipeline("text-classification", model="facebook/bart-large-mnli")

def check_faithfulness(source: str, claim: str) -> dict:
    """
    Check if a claim is entailed by the source document.
    Returns: {"label": "ENTAILMENT"|"NEUTRAL"|"CONTRADICTION", "score": float}
    """
    # NLI format: premise (source) -> hypothesis (claim)
    result = nli(f"{source}</s></s>{claim}")
    return result[0]

# Example usage
source = """The company reported Q3 revenue of $45.2 million,
representing a 12% increase year-over-year."""

claims = [
    "Revenue was $45.2 million.",           # Should be ENTAILMENT
    "Revenue increased by 12% YoY.",         # Should be ENTAILMENT
    "Revenue was $52 million.",              # Should be CONTRADICTION
    "The CEO is optimistic about growth.",   # Should be NEUTRAL (not stated)
]

for claim in claims:
    result = check_faithfulness(source, claim)
    print(f"Claim: {claim}")
    print(f"  -> {result['label']} ({result['score']:.3f})")

# Output:
# Claim: Revenue was $45.2 million.
#   -> ENTAILMENT (0.943)
# Claim: Revenue increased by 12% YoY.
#   -> ENTAILMENT (0.891)
# Claim: Revenue was $52 million.
#   -> CONTRADICTION (0.967)
# Claim: The CEO is optimistic about growth.
#   -> NEUTRAL (0.824)

Quick Reference

Hallucination Types
  • - Factual: contradicts world knowledge
  • - Faithfulness: contradicts source
  • - Intrinsic: direct contradiction
  • - Extrinsic: unverifiable addition
Detection Methods
  • - NLI: entailment checking
  • - SelfCheck: consistency sampling
  • - RAGAS: RAG evaluation
  • - FActScore: atomic fact verification
Mitigation
  • - RAG: ground in documents
  • - Citations: force attribution
  • - Self-consistency: majority vote
  • - Uncertainty: express confidence
Key Takeaways
  • 1. Hallucinations are LLM pattern completion without fact-checking
  • 2. Faithfulness (to source) is easier to verify than factuality (world knowledge)
  • 3. NLI-based detection works well for RAG and document-grounded tasks
  • 4. No single method catches all hallucinations - combine approaches for production

Use Cases

  • RAG answer validation
  • Safety review
  • Content QA
  • Model evaluation

Architectural Patterns

Retrieval Entailment

Compare generation against retrieved evidence with NLI.

Self-Check Ensembles

Ask multiple models/queries and vote on consistency.

Implementations

Open Source

RAGAS

Apache 2.0
Open Source

Metrics for RAG faithfulness.

SelfCheckGPT

MIT
Open Source

Sampling-based hallucination scoring.

G-Eval

MIT
Open Source

LLM-based evaluation prompts.

Benchmarks

Quick Facts

Input
Text
Output
Structured Data
Implementations
3 open source, 0 API
Patterns
2 approaches

Have benchmark data?

Help us track the state of the art for hallucination detection.

Submit Results