Home / OCR / clearOCR
VERIFIED BY CODESOTA Dec 2025

clearOCR (TeamQuest): Independent Benchmark Results

We ran the full OmniDocBench benchmark (1348 images) on TeamQuest's clearOCR API. Here's what we found.

Independently Verified Benchmark

CodeSOTA ran the full OmniDocBench evaluation suite on December 19, 2025. We processed 1,348 images through TeamQuest's clearOCR API and computed metrics using the official evaluation tools.

1348/1355 images Official eval tools Polish OCR service

Important Note

clearOCR is a traditional OCR solution that extracts plain text from images. It does not recognize tables as structured data or mathematical formulas as LaTeX. This significantly impacts OmniDocBench composite scores which weight tables and formulas.

84.6%
Text Accuracy
Verified
0.8%
Table TEDS
No tables
~10%
Formula
No LaTeX
86.0%
Reading Order
Verified

OmniDocBench Results (Verified)

OmniDocBench is a comprehensive document parsing benchmark with 1,355 pages across 9 document types. The composite score formula: ((1-TextEditDist)*100 + TableTEDS + FormulaCDM) / 3

Metric clearOCR Mistral OCR 3 PaddleOCR-VL
Composite Score 31.7 verified 79.75 92.86
Text Edit Distance 0.154 verified 0.099 0.03
Table TEDS 0.8% verified 70.9% 93.5%
Formula (Edit Distance) 0.902 verified 0.218 -
Reading Order 86.0% verified 91.6% -

Lower is better for Edit Distance metrics. CodeSOTA verification date: December 19, 2025.

Performance by Document Type

clearOCR performs best on research reports and academic papers, struggles significantly with newspapers:

Document Type Text Accuracy
Academic Literature 95.0%
Research Reports 95.4%
Magazines 94.2%
Books 88.7%
Exam Papers 87.5%
PPT Slides 86.9%
Notes 87.2%
Newspapers 48.1%

Performance by Language

English
89.4%

Text accuracy

Chinese
78.9%

Text accuracy

Mixed
89.7%

Text accuracy

About clearOCR

clearOCR is a Polish OCR service developed by TeamQuest Sp. z o.o. It's designed for extracting text from documents, with a focus on Polish and English language support.

Key Characteristics

  • Traditional OCR approach - text extraction only
  • No structured table recognition (outputs as plain text)
  • No mathematical formula recognition (no LaTeX output)
  • Max 4 concurrent API requests
  • Self-signed SSL certificate requires verification bypass

Code Example

import requests

API_URL = "https://clearocr.teamquest.pl:60213/extract-document-parser"
headers = {
    "CLEAR-OCR-API-USER": "your-username",
    "CLEAR-OCR-API-KEY": "your-api-key",
    "CLEAR-OCR-API-VERSION": "0.1"
}

with open("document.png", "rb") as f:
    files = {"file": f}
    response = requests.post(
        API_URL,
        headers=headers,
        files=files,
        verify=False  # Self-signed cert
    )

result = response.json()
print(result.get("result", {}).get("text", ""))

Sample Output

clearOCR outputs plain text without markdown structure:

Who Am I?
- Min-Te Sun (Peter) Sun
  - An associate professor of Computer Science & Information Engineering
  - National Central University
  - Studied in US for a long time (from 1993 ~ 2002)
  - Worked as a CS professor at Auburn University, Alabama

Is Good English Important?
- For Individual
  - Reading - lifelong learning to improve your competitiveness
  - Writing - emails with foreign friends/co-workers/customers
  - Listening - Q/A with foreign customers/boss/etc

When to Use clearOCR

Good For
  • Simple text extraction
  • Research reports (95.4% accuracy)
  • Academic papers (95.0% accuracy)
  • Polish language documents
  • When you just need raw text
Not Suitable For
  • Table extraction (0.8% TEDS)
  • Mathematical formulas
  • Newspapers (48.1% accuracy)
  • Documents requiring structure
  • Chinese text (78.9% accuracy)

Verdict

Composite Score: 31.7 - Low on OmniDocBench due to missing table/formula support.

clearOCR is a traditional OCR solution that extracts text only. It performs reasonably well for pure text extraction (84.6% overall, up to 95% on clean documents), but lacks modern document understanding capabilities.

The low composite score (31.7 vs Mistral's 79.75) is primarily because OmniDocBench weights tables and formulas equally with text. For pure text extraction tasks, clearOCR is a viable option, especially for Polish documents.

Best use case: Simple text extraction from clean documents where you don't need table structure or LaTeX formulas.

Provider: TeamQuest Sp. z o.o.
API: clearocr.teamquest.pl:60213
Verified: December 19, 2025 by CodeSOTA
Images processed: 1,348 / 1,355 (7 timeouts)