Home / OCR / Mistral OCR

Mistral OCR

Processing thousands of pages with equations or multilingual text? Need fast, cheap OCR via API without managing infrastructure? Mistral OCR outputs Markdown at $0.001/page.

API Service Paid Production Ready

Verified - We Tested It

Tested on December 17, 2025: 9-page PDF processed in 9.04 seconds, 34,656 chars output. Download output

9.04s
Our Test (9 pages)
34.6K
Chars Output
$0.001
Per Page
50MB
Max File Size

Benchmark Claims (Mistral's Data)

Category Mistral OCR GPT-4o Google Doc AI Azure OCR
Overall 94.9% ~85% 83.4% 89.5%
Scanned Docs 98.96% ~95% 96.15% ~94%
Math/Equations 94.29% ~88% ~75% ~70%
Multilingual 89.55% 86.0% ~82% 87.52%

Source: Mistral's internal benchmarks. Independent verification pending.

Independent Testing Shows Mixed Results

  • Koncile.ai: 98.75% transcription accuracy but 27.5% missing data in structured extraction
  • Docsumo: "Even with moderately clean documents, it often missed key data blocks or misinterpreted layout structures"
  • Parsio.io: Fast and cheap, but less robust than enterprise solutions for complex layouts

We recommend testing on your specific document types before production deployment.

Quick Start

1. Install SDK

pip install mistralai

2. Set API Key

export MISTRAL_API_KEY="your-api-key"

Get your key at console.mistral.ai

3. Process Document

from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Process a PDF from URL
ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": "https://arxiv.org/pdf/2201.04234"
    }
)

# Get markdown output
for page in ocr_response.pages:
    print(page.markdown)

Process Local Files

import base64
from mistralai import Mistral

def encode_file(file_path):
    with open(file_path, "rb") as f:
        return base64.b64encode(f.read()).decode('utf-8')

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Process local PDF
base64_pdf = encode_file("invoice.pdf")
ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": f"data:application/pdf;base64,{base64_pdf}"
    }
)

Pricing Comparison

Service Cost per 1000 Pages Type
Mistral OCR $1.00 API
Mistral OCR (batch) $0.50 API
GPT-4o Vision ~$5-15 API
Google Document AI $1.50 API
Docling $0 (self-hosted) Open Source

Mistral OCR vs Docling (Our Test Results)

Same document: Docling paper (arxiv:2408.09869), tested December 17, 2025

Metric Mistral OCR Docling
Processing Time 9.04 seconds 34.95 seconds
Output Size 34,656 chars 33,201 chars
Pages Processed 9 pages 10 pages
Cost (this test) ~$0.009 $0.00
Data Privacy Sent to Mistral Fully local
Table Export Markdown only DataFrame/CSV
License Proprietary API MIT (open source)

Mistral is ~4x faster but costs money. Docling is free but requires local compute. Download test data

When to Use Mistral OCR

Good For

  • - High-volume processing (2000+ pages)
  • - Scientific papers with equations
  • - Multilingual documents
  • - Quick prototyping (no setup)
  • - When data privacy isn't critical

Consider Alternatives For

  • - Sensitive documents (use Docling)
  • - Complex structured extraction
  • - Low-budget high-volume processing
  • - Air-gapped environments
  • - Custom model fine-tuning needs

Resources