Computer Vision

Building systems that understand images and video? Find benchmarks for recognition, detection, segmentation, and document analysis tasks.

10 tasks 28 datasets 127 results

Scene Text Detection

Detecting text regions in natural scene images.

4 datasets 0 results
ICDAR 2015 ICDAR 2015 Incidental Scene Text 2015

1000 training + 500 test images captured with wearable cameras. Industry standard for scene text detection.

ICDAR 2019 ArT ICDAR 2019 Arbitrary-Shaped Text 2019

Text in arbitrary shapes including curved and rotated text. 10,166 images total.

Total-Text Total-Text 2017

Curved text benchmark. 1555 images with polygon annotations.

CTW1500 Curved Text in the Wild 1500 2019

1500 images with curved text annotations. Focus on arbitrary-shaped text.

Document OCR

Converting scanned documents and images into machine-readable text.

6 datasets 13 results
SROIE Scanned Receipts OCR and Information Extraction 2019

626 receipt images. Key task: extract company, date, address, total from receipts.

KITAB-Bench KITAB Arabic OCR Benchmark 2024
SOTA: 0.13 (cer)
gemini-20-flash

8,809 Arabic text samples across 9 domains. Tests Arabic script recognition.

ThaiOCRBench Thai OCR Benchmark 2024
SOTA: 0.84 (ted-score)
claude-sonnet-4

2,808 Thai text samples across 13 tasks. Tests Thai script structural understanding.

PolEval 2021 OCR PolEval 2021 OCR Post-Correction Task 2021

979 Polish books (69,000 pages) from 1791-1998. Focus on OCR post-correction using NLP methods. Major benchmark for Polish historical document processing.

IMPACT-PSNC IMPACT Polish Digital Libraries Ground Truth 2012

478 pages of ground truth from four Polish digital libraries at 99.95% accuracy. Includes annotations at region, line, word, and glyph levels. Gothic and antiqua fonts.

CodeSOTA Polish CodeSOTA Polish OCR Benchmark 2025

1,000 synthetic and real Polish text images with 5 degradation levels (clean to severe). Tests character-level OCR on diacritics with contamination-resistant synthetic categories. Categories: synth_random (pure character recognition), synth_words (Markov-generated words), real_corpus (Pan Tadeusz, official documents), wikipedia (potential contamination baseline).

Handwriting Recognition

Recognizing handwritten text from images.

3 datasets 8 results
IAM IAM Handwriting Database 1999

13,353 handwritten text lines from 657 writers. Standard handwriting benchmark.

CHURRO-DS Cultural Heritage Understanding Research Repository OCR Dataset 2024
SOTA: 82.3 (printed-levenshtein)
churro-3b

Historical documents from 46 languages, 99K pages. Tests handwritten and printed text recognition across diverse scripts.

Polish EMNIST Extension EMNIST Extended with Polish Diacritics 2020

Extension of EMNIST dataset with Polish handwritten characters including diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż). Tests recognition of Polish-specific characters.

Document Understanding

Extracting semantic information and structure from documents (VDU).

1 datasets 0 results
FUNSD Form Understanding in Noisy Scanned Documents 2019

199 fully annotated forms. Tests semantic entity labeling and linking.

Document Parsing

Converting documents (like PDFs) into structured formats (Markdown/HTML).

2 datasets 50 results
OmniDocBench OmniDocBench v1.5 2024
SOTA: 97.5 (layout-map)
mineru-2.5

981 annotated PDF pages across 9 document categories. Tests end-to-end document parsing including text, tables, and formulas.

olmOCR-Bench olmOCR-Bench 2024
SOTA: 99.9 (base)
chandra-ocr-0.1.0

7,010 unit tests across 1,402 PDF documents. Tests parsing of tables, math, multi-column layouts, old scans, and more.

General OCR Capabilities

Comprehensive benchmarks covering multiple aspects of OCR performance.

4 datasets 24 results
OCRBench v2 OCRBench v2 2024
SOTA: 62.2 (overall-en-private)
seed-1.6-vision

Tests 8 core OCR capabilities across 23 tasks. Evaluates LMMs on text recognition, referring, extraction.

CC-OCR Comprehensive Challenge OCR 2024
SOTA: 83.25 (multi-scene-f1)
gemini-15-pro

Multi-scene text reading, key information extraction, multilingual text, and document parsing benchmark.

MME-VideoOCR MME Video OCR Benchmark 2024
SOTA: 73.7 (total-accuracy)
gemini-25-pro

1,464 videos with 2,000 QA pairs across 25 tasks. Tests OCR capabilities in video content.

reVISION reVISION Polish Vision-Language Benchmark 2025

Polish benchmark for vision-language models including OCR evaluation on educational exam materials. Covers middle school, high school, and professional exams.

Polish OCR

OCR for Polish language including historical documents, gothic fonts, and diacritic recognition.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Image Classification

Categorizing images into predefined classes (ImageNet, CIFAR).

4 datasets 25 results
ImageNet-1K ImageNet Large Scale Visual Recognition Challenge 2012 2012
SOTA: 91 (top-1-accuracy)
coca-finetuned

1.28M training images, 50K validation images across 1,000 object classes. The standard benchmark for image classification since 2012.

ImageNet-V2 ImageNet-V2 Matched Frequency 2019
SOTA: 84 (top-1-accuracy)
swin-v2-large

10K new test images following ImageNet collection process. Tests model generalization beyond the original test set.

CIFAR-10 Canadian Institute for Advanced Research 10 2009
SOTA: 99.1 (accuracy)
deit-b-distilled

60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.

CIFAR-100 Canadian Institute for Advanced Research 100 2009
SOTA: 94.55 (accuracy)
vit-h-14

60K 32x32 color images in 100 fine-grained classes grouped into 20 superclasses. More challenging than CIFAR-10.

Object Detection

Locating and classifying objects in images (COCO, Pascal VOC).

2 datasets 5 results
COCO Microsoft COCO: Common Objects in Context 2014
SOTA: 66 (mAP)
co-detr-swin-l

330K images, 1.5 million object instances, 80 object categories. Standard benchmark for object detection and segmentation.

Pascal VOC 2012 Pascal Visual Object Classes Challenge 2012 2012

11,530 images with 27,450 ROI annotated objects and 6,929 segmentations. Classic object detection benchmark.

Semantic Segmentation

Pixel-level classification of images (Cityscapes, ADE20K).

2 datasets 2 results
Cityscapes Cityscapes Dataset 2016

5,000 images with fine annotations and 20,000 with coarse annotations of urban street scenes.

ADE20K ADE20K Scene Parsing Benchmark 2016
SOTA: 62.9 (mIoU)
internimage-h

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.