Home/Building Blocks/Hallucination Detection

Text→Structured Data

Hallucination Detection

Score or flag generated text for factuality and grounding.

How Hallucination Detection Works

A technical deep-dive into LLM hallucinations: what they are, why they happen, and how to detect and mitigate them. Essential knowledge for production AI systems.

1. What is Hallucination 2. Types 3. Interactive Demo 4. Detection Methods 5. Benchmarks 6. Code

What is Hallucination?

When an LLM generates content that is factually incorrect, unsupported by context, or entirely fabricated - yet presents it with the same confidence as accurate information.

The Core Problem

LLMs do not "know" things the way humans do. They are statistical pattern matchers trained on text. When the patterns suggest a plausible-sounding completion, the model generates it - regardless of factual accuracy. The model has no internal fact-checker, no way to say "I am making this up."

Why This Matters

Hallucinations are indistinguishable from accurate responses without verification. In high-stakes domains (medical, legal, financial), this is dangerous.

Root Cause

LLMs optimize for plausible next-token prediction, not truth. Training on internet text includes errors, contradictions, and outdated information.

Two Dimensions of Correctness

Factual Correctness

Does the output align with real-world facts and knowledge?

Question:

"What is the capital of France?"

Correct: "Paris"

Hallucination: "Lyon"

Verification requires external knowledge (world facts, databases, documents)

Faithfulness to Source

Does the output accurately reflect the provided context/source document?

Context: "The meeting is at 3pm"

"When is the meeting?"

Faithful: "3pm"

Hallucination: "2pm"

Verification only requires the source document (RAG, summarization tasks)

Key Insight: Faithfulness is Easier to Verify

For RAG systems and document-grounded tasks, you have the source text. This makes verification tractable: check if each claim in the output is supported by the source. Factual verification is harder because you need access to ground truth knowledge, which may be vast or unavailable.

Types of Hallucinations

Not all hallucinations are equal. Understanding the taxonomy helps choose the right detection method.

Factual Hallucination

The model states something factually incorrect that contradicts real-world knowledge

High Severity

Prompt: "When did the Eiffel Tower open?"

Output: "The Eiffel Tower opened in 1921 to celebrate France's victory in World War I."

Issue: The Eiffel Tower opened in 1889 for the World's Fair, not 1921.

Faithfulness Hallucination

The model's output contradicts or is not supported by the provided context/source

High Severity

Context: "The agreement specifies a total payment of $50,000."

Prompt: "Based on the document, what was the contract value?"

Output: "According to the document, the contract value was $75,000."

Issue: Output contradicts the source document (50K vs 75K).

Intrinsic Hallucination

The output directly contradicts the source material provided in the prompt

High Severity

Prompt: "Summarize: 'The study found no significant correlation between X and Y.'"

Output: "The study demonstrated a strong correlation between X and Y."

Issue: Direct contradiction of the source text.

Extrinsic Hallucination

The output includes information that cannot be verified from the source (may or may not be true)

Medium Severity

Prompt: "Summarize the article about climate change."

Output: "The article discusses rising temperatures. The author, Dr. Smith, has won multiple awards for this research."

Issue: The awards claim cannot be verified from the source.

Intrinsic vs Extrinsic Hallucinations

Intrinsic

Output directly contradicts the source. Always wrong.

Example: Source says "founded in 2018", output says "founded in 2015"

Extrinsic

Output adds information not in source. May or may not be true.

Example: Source does not mention CEO, output names "CEO John Smith"

Interactive Demo: Spot the Hallucinations

Compare a faithful response to one with hallucinations. Hover over highlighted text to see details.

Source Document

COMPANY REPORT Q3 2024 Revenue: $45.2 million (up 12% YoY) Net Income: $8.1 million Employees: 342 Headquarters: Austin, Texas Founded: 2018 Key Highlights: - Launched new product line in September - Expanded to 3 new markets in Europe - Customer satisfaction score: 4.2/5.0

Question: Summarize this company's Q3 2024 performance.

Hallucinated Response

ContradictionExtrinsic

The company reported exceptional Q3 2024 results with revenue of $52 million, a 15% increase from last year. Net income was $8.1 million. The company, founded in 2015, now employs over 500 people at its San Francisco headquarters. They launched two new product lines and expanded to 5 European countries. CEO John Martinez expressed optimism about future growth.

Contradicts source

Cannot verify (extrinsic)

Hover over highlights for details

Hallucination Analysis

Contradictions

Extrinsic Claims

Supported Claims

Detection Methods

Five approaches to detecting hallucinations, each with different tradeoffs.

NLI-based Detection

Use Natural Language Inference models to check if claims in the output are entailed by the source

Medium Complexity

Mechanism:

For each claim in output, classify as ENTAILMENT, NEUTRAL, or CONTRADICTION against source

Pros

+ Interpretable results

+ Works with any LLM output

+ No LLM queries needed

Cons

- Requires claim decomposition

- NLI models have limits

- May miss nuanced contradictions

SelfCheckGPT

Sample multiple responses and check for consistency - hallucinations tend to be inconsistent across samples

Low Complexity

Mechanism:

Generate N responses, measure agreement. Factual content is consistent; hallucinations vary.

Pros

+ No external knowledge needed

+ Works for any domain

+ Detects confident hallucinations

Cons

- Requires multiple API calls

- Slower and more expensive

- Consistent hallucinations slip through

Retrieval Grounding

Retrieve evidence from knowledge base or web and verify claims against retrieved documents

High Complexity

Mechanism:

Extract claims -> retrieve evidence -> verify each claim against evidence

Pros

+ External ground truth

+ Catches factual errors

+ Scalable to large KBs

Cons

- Requires knowledge base

- Retrieval quality matters

- May not cover all claims

RAGAS Evaluation

Comprehensive RAG evaluation framework measuring faithfulness, relevance, and groundedness

Medium Complexity

Mechanism:

Decompose answer into claims, verify each against context using LLM-as-judge

Pros

+ Production-ready framework

+ Multiple metrics

+ Well-documented

Cons

- Requires LLM for evaluation

- Cost at scale

- LLM judge has biases

Atomic Fact Verification

Break output into atomic facts and verify each independently against knowledge source

High Complexity

Mechanism:

Decompose -> verify each atomic fact -> aggregate scores

Pros

+ Fine-grained analysis

+ Quantifiable scores

+ Academic rigor

Cons

- Decomposition is hard

- Expensive at scale

- Atomic fact definition varies

Use NLI when:

- You have source documents (RAG)
- Need interpretable results
- Want to avoid LLM API costs

Use SelfCheck when:

- No external knowledge available
- Checking factual knowledge
- Latency is not critical

Use RAGAS when:

- Evaluating RAG pipelines
- Need multiple metrics
- Production monitoring

Benchmarks and Evaluation

Standard datasets and metrics for measuring hallucination rates.

Benchmark	Type	Size	Metric	Focus
TruthfulQA	Factuality	817 questions	% Truthful & Informative	Tests resistance to common misconceptions
HaluEval	RAG Hallucination	35K samples	Detection Accuracy	Comprehensive hallucination benchmark
FActScore	Biography Generation	500+ bios	% Supported Facts	Fine-grained factuality
FEVER	Fact Verification	185K claims	Label Accuracy	Evidence-based verification
SummEval	Summarization	2800 summaries	Faithfulness Score	Summarization hallucinations

Mitigation Strategies

Retrieval Augmentation (RAG)

Ground LLM responses in retrieved documents to reduce hallucination

Effectiveness: High for knowledge-grounded tasks

Tradeoff: Adds latency and retrieval complexity

Chain-of-Thought Prompting

Ask model to reason step-by-step, making errors more visible

Effectiveness: Moderate - helps with reasoning errors

Tradeoff: Longer outputs, may still hallucinate confidently

Self-Consistency Decoding

Sample multiple responses and use majority vote or consistency filtering

Effectiveness: High for knowledge-based questions

Tradeoff: Higher latency and cost (multiple generations)

Calibration / Uncertainty

Train or prompt model to express uncertainty when unsure

Effectiveness: Moderate - depends on model calibration

Tradeoff: May be overly cautious

Citation Requirements

Force model to cite sources for each claim

Effectiveness: High for verifiable claims

Tradeoff: Requires post-hoc verification of citations

Knowledge Cutoff Awareness

Make model aware of its training date and knowledge limits

Effectiveness: Moderate for temporal facts

Tradeoff: Does not help with persistent misconceptions

Code Examples

Implementations of each detection method. Start with NLI for RAG systems or SelfCheck for general use.

NLI Detectionpip install transformers

Interpretable

from transformers import pipeline

# Load NLI model for entailment checking
nli = pipeline("text-classification", model="facebook/bart-large-mnli")

def check_faithfulness(source: str, claim: str) -> dict:
    """
    Check if a claim is entailed by the source document.
    Returns: {"label": "ENTAILMENT"|"NEUTRAL"|"CONTRADICTION", "score": float}
    """
    # NLI format: premise (source) -> hypothesis (claim)
    result = nli(f"{source}</s></s>{claim}")
    return result[0]

# Example usage
source = """The company reported Q3 revenue of $45.2 million,
representing a 12% increase year-over-year."""

claims = [
    "Revenue was $45.2 million.",           # Should be ENTAILMENT
    "Revenue increased by 12% YoY.",         # Should be ENTAILMENT
    "Revenue was $52 million.",              # Should be CONTRADICTION
    "The CEO is optimistic about growth.",   # Should be NEUTRAL (not stated)
]

for claim in claims:
    result = check_faithfulness(source, claim)
    print(f"Claim: {claim}")
    print(f"  -> {result['label']} ({result['score']:.3f})")

# Output:
# Claim: Revenue was $45.2 million.
#   -> ENTAILMENT (0.943)
# Claim: Revenue increased by 12% YoY.
#   -> ENTAILMENT (0.891)
# Claim: Revenue was $52 million.
#   -> CONTRADICTION (0.967)
# Claim: The CEO is optimistic about growth.
#   -> NEUTRAL (0.824)

Quick Reference

Hallucination Types

- Factual: contradicts world knowledge
- Faithfulness: contradicts source
- Intrinsic: direct contradiction
- Extrinsic: unverifiable addition

Detection Methods

- NLI: entailment checking
- SelfCheck: consistency sampling
- RAGAS: RAG evaluation
- FActScore: atomic fact verification

Mitigation

- RAG: ground in documents
- Citations: force attribution
- Self-consistency: majority vote
- Uncertainty: express confidence

Key Takeaways

1. Hallucinations are LLM pattern completion without fact-checking
2. Faithfulness (to source) is easier to verify than factuality (world knowledge)
3. NLI-based detection works well for RAG and document-grounded tasks
4. No single method catches all hallucinations - combine approaches for production