clearOCR (TeamQuest): Independent Benchmark Results
We ran the full OmniDocBench benchmark (1348 images) on TeamQuest's clearOCR API. Here's what we found.
Independently Verified Benchmark
CodeSOTA ran the full OmniDocBench evaluation suite on December 19, 2025. We processed 1,348 images through TeamQuest's clearOCR API and computed metrics using the official evaluation tools.
Important Note
clearOCR is a traditional OCR solution that extracts plain text from images. It does not recognize tables as structured data or mathematical formulas as LaTeX. This significantly impacts OmniDocBench composite scores which weight tables and formulas.
OmniDocBench Results (Verified)
OmniDocBench is a comprehensive document parsing benchmark with 1,355 pages across 9 document types.
The composite score formula: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3
| Metric | clearOCR | Mistral OCR 3 | PaddleOCR-VL |
|---|---|---|---|
| Composite Score | 31.7 verified | 79.75 | 92.86 |
| Text Edit Distance | 0.154 verified | 0.099 | 0.03 |
| Table TEDS | 0.8% verified | 70.9% | 93.5% |
| Formula (Edit Distance) | 0.902 verified | 0.218 | - |
| Reading Order | 86.0% verified | 91.6% | - |
Lower is better for Edit Distance metrics. CodeSOTA verification date: December 19, 2025.
Performance by Document Type
clearOCR performs best on research reports and academic papers, struggles significantly with newspapers:
| Document Type | Text Accuracy |
|---|---|
| Academic Literature | 95.0% |
| Research Reports | 95.4% |
| Magazines | 94.2% |
| Books | 88.7% |
| Exam Papers | 87.5% |
| PPT Slides | 86.9% |
| Notes | 87.2% |
| Newspapers | 48.1% |
Performance by Language
Text accuracy
Text accuracy
Text accuracy
About clearOCR
clearOCR is a Polish OCR service developed by TeamQuest Sp. z o.o. It's designed for extracting text from documents, with a focus on Polish and English language support.
Key Characteristics
- Traditional OCR approach - text extraction only
- No structured table recognition (outputs as plain text)
- No mathematical formula recognition (no LaTeX output)
- Max 4 concurrent API requests
- Self-signed SSL certificate requires verification bypass
Code Example
import requests
API_URL = "https://clearocr.teamquest.pl:60213/extract-document-parser"
headers = {
"CLEAR-OCR-API-USER": "your-username",
"CLEAR-OCR-API-KEY": "your-api-key",
"CLEAR-OCR-API-VERSION": "0.1"
}
with open("document.png", "rb") as f:
files = {"file": f}
response = requests.post(
API_URL,
headers=headers,
files=files,
verify=False # Self-signed cert
)
result = response.json()
print(result.get("result", {}).get("text", "")) Sample Output
clearOCR outputs plain text without markdown structure:
Who Am I?
- Min-Te Sun (Peter) Sun
- An associate professor of Computer Science & Information Engineering
- National Central University
- Studied in US for a long time (from 1993 ~ 2002)
- Worked as a CS professor at Auburn University, Alabama
Is Good English Important?
- For Individual
- Reading - lifelong learning to improve your competitiveness
- Writing - emails with foreign friends/co-workers/customers
- Listening - Q/A with foreign customers/boss/etc When to Use clearOCR
- Simple text extraction
- Research reports (95.4% accuracy)
- Academic papers (95.0% accuracy)
- Polish language documents
- When you just need raw text
- Table extraction (0.8% TEDS)
- Mathematical formulas
- Newspapers (48.1% accuracy)
- Documents requiring structure
- Chinese text (78.9% accuracy)
Verdict
Composite Score: 31.7 - Low on OmniDocBench due to missing table/formula support.
clearOCR is a traditional OCR solution that extracts text only. It performs reasonably well for pure text extraction (84.6% overall, up to 95% on clean documents), but lacks modern document understanding capabilities.
The low composite score (31.7 vs Mistral's 79.75) is primarily because OmniDocBench weights tables and formulas equally with text. For pure text extraction tasks, clearOCR is a viable option, especially for Polish documents.
Best use case: Simple text extraction from clean documents where you don't need table structure or LaTeX formulas.
Provider: TeamQuest Sp. z o.o.
API: clearocr.teamquest.pl:60213
Verified: December 19, 2025 by CodeSOTA
Images processed: 1,348 / 1,355 (7 timeouts)