We ran the full OmniDocBench benchmark (1348 images) on TeamQuest's clearOCR API. Here's what we found.
CodeSOTA ran the full OmniDocBench evaluation suite on December 19, 2025. We processed 1,348 images through TeamQuest's clearOCR API and computed metrics using the official evaluation tools.
clearOCR is a traditional OCR solution that extracts plain text from images. It does not recognize tables as structured data or mathematical formulas as LaTeX. This significantly impacts OmniDocBench composite scores which weight tables and formulas.
OmniDocBench is a comprehensive document parsing benchmark with 1,355 pages across 9 document types. The composite score formula: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3
| Metric | clearOCR | Mistral OCR 3 | PaddleOCR-VL |
|---|---|---|---|
| Composite Score | 31.7 verified | 79.75 | 92.86 |
| Text Edit Distance | 0.154 verified | 0.099 | 0.03 |
| Table TEDS | 0.8% verified | 70.9% | 93.5% |
| Formula (Edit Distance) | 0.902 verified | 0.218 | - |
| Reading Order | 86.0% verified | 91.6% | - |
Lower is better for Edit Distance metrics. CodeSOTA verification date: December 19, 2025.
clearOCR performs best on research reports and academic papers, struggles significantly with newspapers:
| Document Type | Text Accuracy |
|---|---|
| Academic Literature | 95.0% |
| Research Reports | 95.4% |
| Magazines | 94.2% |
| Books | 88.7% |
| Exam Papers | 87.5% |
| PPT Slides | 86.9% |
| Notes | 87.2% |
| Newspapers | 48.1% |
clearOCR is a Polish OCR service developed by TeamQuest Sp. z o.o. It's designed for extracting text from documents, with a focus on Polish and English language support.
import requests
API_URL = "https://clearocr.teamquest.pl:60213/extract-document-parser"
headers = {
"CLEAR-OCR-API-USER": "your-username",
"CLEAR-OCR-API-KEY": "your-api-key",
"CLEAR-OCR-API-VERSION": "0.1"
}
with open("document.png", "rb") as f:
files = {"file": f}
response = requests.post(
API_URL,
headers=headers,
files=files,
verify=False # Self-signed cert
)
result = response.json()
print(result.get("result", {}).get("text", ""))clearOCR outputs plain text without markdown structure:
Who Am I?
- Min-Te Sun (Peter) Sun
- An associate professor of Computer Science & Information Engineering
- National Central University
- Studied in US for a long time (from 1993 ~ 2002)
- Worked as a CS professor at Auburn University, Alabama
Is Good English Important?
- For Individual
- Reading - lifelong learning to improve your competitiveness
- Writing - emails with foreign friends/co-workers/customers
- Listening - Q/A with foreign customers/boss/etcComposite Score: 31.7 — Low on OmniDocBench due to missing table/formula support.
clearOCR is a traditional OCR solution that extracts text only. It performs reasonably well for pure text extraction (84.6% overall, up to 95% on clean documents), but lacks modern document understanding capabilities.
The low composite score (31.7 vs Mistral's 79.75) is primarily because OmniDocBench weights tables and formulas equally with text. For pure text extraction tasks, clearOCR is a viable option, especially for Polish documents.
Best use case: Simple text extraction from clean documents where you don't need table structure or LaTeX formulas.
Provider: TeamQuest Sp. z o.o.
API: clearocr.teamquest.pl:60213
Verified: December 19, 2025 by CodeSOTA
Images processed: 1,348 / 1,355 (7 timeouts)