Codesota · OCR · clearOCRHome/OCR/clearOCR

Verified by CodeSOTA · Dec 2025

clearOCR (TeamQuest): Independent Benchmark Results.

We ran the full OmniDocBench benchmark (1348 images) on TeamQuest's clearOCR API. Here's what we found.

§ 01 · Verification

Independently verified.

CodeSOTA ran the full OmniDocBench evaluation suite on December 19, 2025. We processed 1,348 images through TeamQuest's clearOCR API and computed metrics using the official evaluation tools.

1348/1355 imagesOfficial eval toolsPolish OCR service

§ 02 · Important note

Traditional OCR, not document understanding.

clearOCR is a traditional OCR solution that extracts plain text from images. It does not recognize tables as structured data or mathematical formulas as LaTeX. This significantly impacts OmniDocBench composite scores which weight tables and formulas.

§ 03 · Headline numbers

The four metrics that matter.

84.6%

Text Accuracy

Verified

0.8%

Table TEDS

No tables

~10%

Formula

No LaTeX

86.0%

Reading Order

Verified

§ 04 · OmniDocBench results

Verified scores against the field.

OmniDocBench is a comprehensive document parsing benchmark with 1,355 pages across 9 document types. The composite score formula: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3

Metric	clearOCR	Mistral OCR 3	PaddleOCR-VL
Composite Score	31.7 verified	79.75	92.86
Text Edit Distance	0.154 verified	0.099	0.03
Table TEDS	0.8% verified	70.9%	93.5%
Formula (Edit Distance)	0.902 verified	0.218	-
Reading Order	86.0% verified	91.6%	-

Lower is better for Edit Distance metrics. CodeSOTA verification date: December 19, 2025.

§ 05 · Performance by document type

Strong on academic, weak on newspapers.

clearOCR performs best on research reports and academic papers, struggles significantly with newspapers:

Document Type	Text Accuracy
Academic Literature	95.0%
Research Reports	95.4%
Magazines	94.2%
Books	88.7%
Exam Papers	87.5%
PPT Slides	86.9%
Notes	87.2%
Newspapers	48.1%

§ 06 · Performance by language

English leads, Chinese trails.

English

89.4%

Text accuracy

Chinese

78.9%

Text accuracy

Mixed

89.7%

Text accuracy

§ 07 · About clearOCR

Polish OCR from TeamQuest.

clearOCR is a Polish OCR service developed by TeamQuest Sp. z o.o. It's designed for extracting text from documents, with a focus on Polish and English language support.

Key Characteristics

Traditional OCR approach - text extraction only
No structured table recognition (outputs as plain text)
No mathematical formula recognition (no LaTeX output)
Max 4 concurrent API requests
Self-signed SSL certificate requires verification bypass

§ 08 · Code example

Calling the API.

import requests
API_URL = "https://clearocr.teamquest.pl:60213/extract-document-parser"
headers = {
    "CLEAR-OCR-API-USER": "your-username",
    "CLEAR-OCR-API-KEY": "your-api-key",
    "CLEAR-OCR-API-VERSION": "0.1"
}
with open("document.png", "rb") as f:
    files = {"file": f}
    response = requests.post(
        API_URL,
        headers=headers,
        files=files,
        verify=False  # Self-signed cert
    )
result = response.json()
print(result.get("result", {}).get("text", ""))

§ 09 · Sample output

Plain text, no markdown structure.

clearOCR outputs plain text without markdown structure:

Who Am I?
- Min-Te Sun (Peter) Sun
  - An associate professor of Computer Science & Information Engineering
  - National Central University
  - Studied in US for a long time (from 1993 ~ 2002)
  - Worked as a CS professor at Auburn University, Alabama
Is Good English Important?
- For Individual
  - Reading - lifelong learning to improve your competitiveness
  - Writing - emails with foreign friends/co-workers/customers
  - Listening - Q/A with foreign customers/boss/etc

§ 10 · When to use

Fit for purpose.

Good For

Simple text extraction
Research reports (95.4% accuracy)
Academic papers (95.0% accuracy)
Polish language documents
When you just need raw text

Not Suitable For

Table extraction (0.8% TEDS)
Mathematical formulas
Newspapers (48.1% accuracy)
Documents requiring structure
Chinese text (78.9% accuracy)

§ 11 · Verdict

Composite Score: 31.7.

Composite Score: 31.7 — Low on OmniDocBench due to missing table/formula support.

clearOCR is a traditional OCR solution that extracts text only. It performs reasonably well for pure text extraction (84.6% overall, up to 95% on clean documents), but lacks modern document understanding capabilities.

The low composite score (31.7 vs Mistral's 79.75) is primarily because OmniDocBench weights tables and formulas equally with text. For pure text extraction tasks, clearOCR is a viable option, especially for Polish documents.

Best use case: Simple text extraction from clean documents where you don't need table structure or LaTeX formulas.

Provider: TeamQuest Sp. z o.o.
API: clearocr.teamquest.pl:60213
Verified: December 19, 2025 by CodeSOTA
Images processed: 1,348 / 1,355 (7 timeouts)

§ 12 · Related

Continue reading.

Compare with Other Models

Use Case Guides

← OCR Benchmarks Mistral OCR 3 →