Codesota · Tasks · OCRHome/Tasks/Computer Vision/OCR

OCR.

OCR, or Optical Character Recognition, is the task of converting an image containing text into machine-readable, editable, and searchable digital text data. This involves converting scanned documents, photos, or image-only PDFs to text from their static visual format, enabling the document to be edited, searched, or used for data entry and other applications. Examples include digitizing receipts for your bank app, translating signs with Google Translate, or creating searchable archives from old documents.

5
Datasets
0
Results
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

Seeking canonical benchmark for this task.

Suggest one →
§ 03 · Top 10

Leading models.

Leading models across all datasets in this task.

No results yet. Be the first to contribute.

What were you looking for on OCR?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

5 datasets tracked for this task.

Fox (English subset, 600-1300 text tokens)
0 results
OCRBench
0 results
OmniDocBench v1.0
0 results
OmniDocBench v1.5
0 results
olmOCR-Bench
0 results
§ 05 · Related tasks

Other tasks in Computer Vision.

3D Understanding3D generationDepth estimationFew-Shot Image ClassificationImage ClassificationImage editingImage generationImage segmentation
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on OCR? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.