Codesota · Tasks · OCRHome/Tasks/Computer Vision/OCR

Computer Vision

OCR.

OCR, or Optical Character Recognition, is the task of converting an image containing text into machine-readable, editable, and searchable digital text data. This involves converting scanned documents, photos, or image-only PDFs to text from their static visual format, enabling the document to be edited, searched, or used for data entry and other applications. Examples include digitizing receipts for your bank app, translating signs with Google Translate, or creating searchable archives from old documents.

5

Datasets

1

Results

—

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

Seeking canonical benchmark for this task.

Suggest one →

§ 03 · Top 10

Leading models.

Leading models across all datasets in this task.

#	Model	Score	Year	Source
★	HunyuanOCR (1B)	860	—	paper ↗

What were you looking for on OCR?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

5 datasets tracked for this task.

Top: HunyuanOCR (1B) — 860

Fox (English subset, 600-1300 text tokens)

OmniDocBench v1.0

OmniDocBench v1.5

§ 05 · Related tasks

Other tasks in Computer Vision.

3D Understanding Depth estimation Document Image Classification Document Layout Analysis Document Parsing Document Understanding General OCR Capabilities Handwriting Recognition

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on OCR? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.