Union14M

Unknown

Next-generation scene text recognition benchmark assembled from 14 datasets (4M labeled + 10M unlabeled images). Accuracy drops 33-48% vs standard benchmarks, exposing real-world model limitations across 7 challenge categories: Artistic, Multi-Oriented, Salient, Multi-Words, General, Contextless, Incomplete.

Benchmark Stats

Models5
Papers5
Metrics1

SOTA History

Not enough data to show trend.

accuracy

accuracy

Higher is better

RankModelSourceScoreYearPaper
1CLIP4STR-B

CLIP4STR-B on Union14M-Benchmark. 70.8% word accuracy. Reported in Union14M paper (arXiv 2307.08723, ICCV 2023) and CLIP4STR paper. Best model on Union14M at time of benchmark publication.

Community70.82026Source
2PARSeq

PARSeq on Union14M-Benchmark. 67.8% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). Strong ECCV 2022 baseline exposed with real-world difficulty.

Community67.82026Source
3LPV-S

LPV-S (Language-Guided Progressive Vison, Small) on Union14M-Benchmark. 65.1% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023).

Community65.12026Source
4MAERec-S

MAERec-S on Union14M-Benchmark. 62.4% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). MAE pre-training for text recognition.

Community62.42026Source
5CDistNet

CDistNet on Union14M-Benchmark. 56.2% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). AAAI 2022 baseline.

Community56.22026Source

Submit a Result