Chart and Table Understanding
Parse charts, diagrams, and tables into structured data for analysis and QA.
How Chart and Table Understanding Works
A technical deep-dive into extracting structured data from charts, graphs, and tables. From bar charts to complex financial tables, understanding the visual grammar of data visualization.
The Problem
Why is understanding charts hard for machines when humans find it trivial?
Charts and tables encode data visually. Humans read them effortlessly, extracting trends, comparisons, and specific values. But to a machine, a bar chart is just colored rectangles. How do we bridge this gap?
The key is recognizing that charts have grammar: axes define scales, marks represent data points, legends map colors to categories. By understanding this visual grammar, we can reverse-engineer the underlying data.
Chart understanding is not just OCR. It requires spatial reasoning (this bar is taller than that one), semantic understanding (the x-axis represents time), and numerical precision (that bar reaches exactly 72).
Three Skills a Chart Reader Needs
Detecting chart elements: bars, lines, points, axes, legends, titles. Each chart type has its own visual vocabulary.
Understanding that position encodes value. A bar reaching 75% height means 75% of the scale. Relative positions matter.
Connecting visual elements to meaning. The blue line represents revenue, the x-axis is time, the legend explains the colors.
Chart Types and Their Challenges
Each chart type encodes data differently. Understanding these visual grammars is step one.
Bar Chart
Categorical comparisons using rectangular bars
- *Stacked vs grouped
- *Horizontal vs vertical
- *Reading exact values
Extraction Pipeline
Interactive: Watch Data Extraction
{
"chart_type": "bar",
"title": "Quarterly Revenue",
"data": [
{
"label": "Q1",
"value": 45
},
{
"label": "Q2",
"value": 72
},
{
"label": "Q3",
"value": 63
},
{
"label": "Q4",
"value": 89
}
]
}The Processing Pipeline
From raw pixels to structured data. Each stage transforms the representation closer to machine-readable format.
Type Detection: The First Decision
Before extracting data, we must know what kind of chart we are looking at. A bar chart extracts differently than a line chart. This classification determines the entire downstream pipeline.
Structure Analysis: Finding the Grammar
Every chart has structural elements: axes define the coordinate system, legends map visual properties to meaning, titles provide context. Detecting these is like parsing the syntax of a visual language.
Architectural Approaches
Three fundamentally different ways to approach chart understanding. Each has its place.
Detect elements, then extract each
- 1. Chart type classification
- 2. Element detection (bars, lines, points)
- 3. OCR for text
- 4. Geometric reasoning for values
Image in, structured data out
- 1. Vision encoder (ViT, Swin)
- 2. Cross-attention to query
- 3. Text decoder generates JSON/markdown
Leverage general vision-language capabilities
- 1. Image encoded as tokens
- 2. Natural language query
- 3. Structured output via prompting
When to Use What
When you need high precision, interpretable results, and have clean, standardized charts. Best for production systems with known chart formats.
When you have varied chart styles and can tolerate some errors. Good for quick prototypes and when training data is available.
When you need to reason about charts, answer questions, or handle unexpected formats. Best for analysis tasks, not bulk extraction.
Key Models
The models you should know for chart and table understanding in 2024-2025.
- +High accuracy on clean charts
- +Interpretable outputs
- +Chart-specific logic
- -Requires chart type classification
- -Struggles with unusual layouts
Production chart extraction
- +No OCR preprocessing
- +Handles diverse layouts
- +One model for many tasks
- -May miss fine numerical details
- -Needs task-specific prompting
Quick prototyping, varied documents
- +Trained on web screenshots
- +Good for infographics
- +Chart-specific fine-tuning available
- -Limited to training distribution
- -Can hallucinate values
Infographics, web charts, UI screenshots
- +Best reasoning about charts
- +Handles questions naturally
- +Zero-shot capability
- -Expensive at scale
- -May hallucinate numbers
- -Not deterministic
Chart QA, analysis, insights
- +State-of-the-art table detection
- +Cell-level extraction
- +Handles complex layouts
- -Tables only, not charts
- -Requires OCR for text
Document tables, forms
Benchmarks
Standard datasets for evaluating chart and table understanding systems.
| Benchmark | Focus | Size | Metric | SOTA |
|---|---|---|---|---|
| ChartQA | Chart QA | 32K QA pairs | Accuracy | GPT-4V: 78.5% |
| PlotQA | Scientific Charts | 224K QA pairs | Accuracy | DePlot: 54.8% |
| ChartInfo | Chart Summarization | 7K charts | BLEU/METEOR | MatCha: 0.42 |
| PubTabNet | Table Structure | 568K tables | TEDS | TableFormer: 96.8% |
| FinTabNet | Financial Tables | 113K tables | TEDS | VAST: 97.1% |
| SciGraphQA | Scientific Figures | 295K QA pairs | Accuracy | LLaVA: 45.2% |
Measures structural similarity between predicted and ground-truth tables. Accounts for both content and structure (rows, columns, spans). Score of 1.0 means perfect match; commonly see 0.9+ for good systems.
Percentage of questions answered correctly about charts. Questions range from simple value lookup to complex reasoning. Human performance is around 85%; best models reach 78%.
Code Examples
Get started with chart and table understanding in Python.
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch
import json
# Load Donut fine-tuned for chart understanding
processor = DonutProcessor.from_pretrained(
"naver-clova-ix/donut-base-finetuned-docvqa"
)
model = VisionEncoderDecoderModel.from_pretrained(
"naver-clova-ix/donut-base-finetuned-docvqa"
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Load chart image
image = Image.open("chart.png").convert("RGB")
# Create task prompt for chart understanding
task_prompt = "<s_docvqa><s_question>Extract all data from this chart</s_question><s_answer>"
# Process
decoder_input_ids = processor.tokenizer(
task_prompt, add_special_tokens=False, return_tensors="pt"
).input_ids.to(device)
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
# Generate
outputs = model.generate(
pixel_values,
decoder_input_ids=decoder_input_ids,
max_length=512,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=4,
)
# Decode
result = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(result)
# For chart-specific Donut, try:
# "naver-clova-ix/donut-base-finetuned-cord-v2" for receipts
# Custom fine-tuning on ChartQA dataset for chartsQuick Reference
- - Pix2Struct (charts/infographics)
- - DePlot (chart to table)
- - GPT-4V (analysis/QA)
- - TableTransformer (detection)
- - Donut (end-to-end)
- - AWS Textract (production)
- - ChartQA (chart understanding)
- - PubTabNet (table structure)
- - PlotQA (scientific charts)
- - Hallucinated numbers from LLMs
- - Wrong chart type detection
- - Complex table structures
Key Takeaways
- 1. Chart understanding requires visual, spatial, and semantic reasoning combined
- 2. Tables and charts need different approaches; use specialized tools
- 3. Multimodal LLMs excel at reasoning but may hallucinate numbers
- 4. For production extraction, consider DePlot + LLM or TableTransformer
Use Cases
- ✓Financial chart QA
- ✓Research figure extraction
- ✓Table-to-CSV
- ✓Dashboard auditing
Architectural Patterns
Layout-Aware Parsing
Detect cells/regions then recognize text and structure (table grid detection + OCR).
Vision-Language Chart QA
Use chart-specific VLMs to answer questions or extract series.
Implementations
Open Source
Table Transformer (TATR)
MITDetects table structure; pair with OCR for content.
Benchmarks
Quick Facts
- Input
- Image
- Output
- Structured Data
- Implementations
- 3 open source, 0 API
- Patterns
- 2 approaches
Have benchmark data?
Help us track the state of the art for chart and table understanding.
Submit Results