Home/OCR/Hardparse
#1 on OmniDocBench · 92.56% accuracy

Document parsing that
runs on your Mac.

Hardparse uses PaddleOCR-VL — the highest-scoring open-source OCR model — to extract tables, formulas, and structured text from any document. No cloud. No subscriptions. Your documents never leave your machine.

macOS 14+·Apple Silicon optimized·One-time purchase·No data collection

Why engineers are switching from cloud OCR

The economics of document parsing changed in October 2025. Here's what that means for you.

167x

Cheaper than Textract

AWS Textract costs $65,000/mo for 1M pages. PaddleOCR-VL on a single GPU: $390. Hardparse puts this on your Mac for a one-time $25.

92.56%

Beats GPT-4o on documents

PaddleOCR-VL scores 92.56% on OmniDocBench — higher than GPT-4o, Gemini 2.5 Pro, and every commercial API we've tested.

0 bytes

Sent to any server

The Mac app is fully local. No API keys, no internet, no data collection. Process contracts, medical records, financial statements — everything stays on your machine.

How Hardparse compares

FeatureHardparse (Mac)Hardparse APIAWS TextractGoogle Doc AIGPT-4o
OmniDocBench Score92.56%92.56%~85%~87%~88%
Table extractionNativeNativeAdd-on ($$$)Add-onPrompt-based
Math / LaTeXYesYesNoNoPartial
HandwritingYesYesYesYesYes
Languages10911~20~6090+
Privacy100% localCloudCloudCloudCloud
Pricing$25 onceFree / $49/mo$65K/1M pg$30K/1M pgToken-based
Internet requiredNoYesYesYesYes
Output formatsMD, JSON, TXTMD, JSONJSONJSONText

Accuracy scores from OmniDocBench v1.5. Pricing based on 1M pages/month at standard tier. See full cost analysis.

Choose your workflow

Same engine, two ways to use it.

For individuals

Mac App

Drag and drop. Get structured text. Everything runs on your Mac's GPU via Metal. No internet, no accounts, no data leaves your machine.

PDFs, images, screenshots, HEIC
Tables, formulas, handwriting, 109 languages
Markdown, JSON, or plain text output
Multi-page document support
Apple Silicon optimized (M1–M4)
Family Sharing (up to 6 people)
$25one-time
Download on the Mac App Store

macOS 14+ · Apple Silicon · 2.1 GB

For developers & teams

API

One POST request. Structured output. Drop it into your pipeline and forget about OCR infrastructure.

Single endpoint: POST /v1/parse
PDF, PNG, JPG, TIFF, HEIC (up to 20MB)
Layout detection with bounding boxes + confidence
Table, formula, and handwriting extraction
Markdown and JSON output
Webhooks and priority queue (Pro)
Free500 calls/mo
|
$49/mo · 10K calls
Get API Key — No Credit Card

No credit card · 500 free calls · Upgrade anytime

What people parse

📄

Invoices & Receipts

Extract line items, totals, tax info into structured data

🎓

Academic Papers

Tables, citations, equations rendered as LaTeX

🏦

Bank Statements

Transaction tables parsed into rows and columns

⚖️

Contracts & Legal

Clauses, signatures, handwritten notes — fully local

🏥

Medical Records

HIPAA-friendly: zero data leaves your machine

📐

Engineering Drawings

Annotations, dimensions, part numbers extracted

🌍

Multilingual Docs

109 languages including CJK, Arabic, Devanagari

📸

Screenshots

Paste from clipboard, get structured text instantly

Three lines to parse a document

terminal
curl -X POST https://api.hardparse.com/v1/parse \
  -H "Authorization: Bearer hp_your_key" \
  -F "file=@invoice.pdf"

# Response:
{
  "regions": [
    { "type": "table", "confidence": 0.97, "markdown": "| Item | Qty | Price |\n|---|---|---|\n| Widget | 100 | $5.00 |" },
    { "type": "text", "confidence": 0.99, "markdown": "## Invoice #2847\nDate: March 15, 2026" },
    { "type": "handwriting", "confidence": 0.94, "markdown": "Approved - JS" }
  ],
  "processing_time_ms": 1240
}

The benchmark story behind Hardparse

In October 2025, PaddleOCR-VL launched with 0.9 billion parameters and scored 92.56% on OmniDocBench — beating GPT-4o, Gemini 2.5 Pro, and every commercial API. A model 220x smaller than GPT-4 that's better at reading documents. Hardparse is the easiest way to use that model.

Frequently asked questions

Do I need an internet connection?

Not for the Mac app. The AI models are bundled in the app (2.1 GB download). Everything runs on your Mac's GPU. The API obviously requires internet.

How accurate is it on tables?

PaddleOCR-VL is the top-scoring model on OmniDocBench table extraction. In our testing, it handles nested tables, merged cells, and borderless tables that break AWS Textract.

Can I use it for sensitive documents?

The Mac app processes everything locally. No data is sent anywhere. No telemetry, no analytics, no data collection at all. This makes it suitable for legal, medical, and financial documents.

What about Intel Macs?

The app requires Apple Silicon (M1 or later) for GPU acceleration via Metal. Intel Macs are not supported.

How does the API compare to running it myself?

The API handles infrastructure, scaling, and model updates. If you need to process documents at scale without managing GPUs, the API is the easier path. If privacy is paramount, the Mac app keeps everything local.

Is there a free trial of the Mac app?

The Mac App Store doesn't support free trials, but Apple offers refunds within 14 days. The API has a permanent free tier with 500 calls/month — you can test accuracy there before buying the app.

Stop paying per page.

One purchase. Unlimited documents. The highest-accuracy OCR model, running on your Mac.

Mac App: one-time purchase, no subscription · API: 500 free calls/month, no credit card

Related OCR content