Mistral OCR
Processing thousands of pages with equations or multilingual text? Need fast, cheap OCR via API without managing infrastructure? Mistral OCR outputs Markdown at $0.001/page.
API Service Paid Production Ready
Verified - We Tested It
Tested on December 17, 2025: 9-page PDF processed in 9.04 seconds, 34,656 chars output. Download output
9.04s
Our Test (9 pages)
34.6K
Chars Output
$0.001
Per Page
50MB
Max File Size
Benchmark Claims (Mistral's Data)
| Category | Mistral OCR | GPT-4o | Google Doc AI | Azure OCR |
|---|---|---|---|---|
| Overall | 94.9% | ~85% | 83.4% | 89.5% |
| Scanned Docs | 98.96% | ~95% | 96.15% | ~94% |
| Math/Equations | 94.29% | ~88% | ~75% | ~70% |
| Multilingual | 89.55% | 86.0% | ~82% | 87.52% |
Source: Mistral's internal benchmarks. Independent verification pending.
Independent Testing Shows Mixed Results
- Koncile.ai: 98.75% transcription accuracy but 27.5% missing data in structured extraction
- Docsumo: "Even with moderately clean documents, it often missed key data blocks or misinterpreted layout structures"
- Parsio.io: Fast and cheap, but less robust than enterprise solutions for complex layouts
We recommend testing on your specific document types before production deployment.
Quick Start
1. Install SDK
pip install mistralai 3. Process Document
from mistralai import Mistral
import os
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Process a PDF from URL
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
}
)
# Get markdown output
for page in ocr_response.pages:
print(page.markdown) Process Local Files
import base64
from mistralai import Mistral
def encode_file(file_path):
with open(file_path, "rb") as f:
return base64.b64encode(f.read()).decode('utf-8')
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Process local PDF
base64_pdf = encode_file("invoice.pdf")
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "document_url",
"document_url": f"data:application/pdf;base64,{base64_pdf}"
}
) Pricing Comparison
| Service | Cost per 1000 Pages | Type |
|---|---|---|
| Mistral OCR | $1.00 | API |
| Mistral OCR (batch) | $0.50 | API |
| GPT-4o Vision | ~$5-15 | API |
| Google Document AI | $1.50 | API |
| Docling | $0 (self-hosted) | Open Source |
Mistral OCR vs Docling (Our Test Results)
Same document: Docling paper (arxiv:2408.09869), tested December 17, 2025
| Metric | Mistral OCR | Docling |
|---|---|---|
| Processing Time | 9.04 seconds | 34.95 seconds |
| Output Size | 34,656 chars | 33,201 chars |
| Pages Processed | 9 pages | 10 pages |
| Cost (this test) | ~$0.009 | $0.00 |
| Data Privacy | Sent to Mistral | Fully local |
| Table Export | Markdown only | DataFrame/CSV |
| License | Proprietary API | MIT (open source) |
Mistral is ~4x faster but costs money. Docling is free but requires local compute. Download test data
When to Use Mistral OCR
Good For
- - High-volume processing (2000+ pages)
- - Scientific papers with equations
- - Multilingual documents
- - Quick prototyping (no setup)
- - When data privacy isn't critical
Consider Alternatives For
- - Sensitive documents (use Docling)
- - Complex structured extraction
- - Low-budget high-volume processing
- - Air-gapped environments
- - Custom model fine-tuning needs