Home/Browse/Computer Vision/Scene Text Detection/Union14M

Union14M

Name: Union14M Benchmark Results
Creator: Unknown
License: https://creativecommons.org/licenses/by/4.0/

Unknown

Next-generation scene text recognition benchmark assembled from 14 datasets (4M labeled + 10M unlabeled images). Accuracy drops 33-48% vs standard benchmarks, exposing real-world model limitations across 7 challenge categories: Artistic, Multi-Oriented, Salient, Multi-Words, General, Contextless, Incomplete.

Paper Leaderboard

Benchmark Stats

Models5

Papers5

Metrics1

SOTA History

Not enough data to show trend.

accuracy

Higher is better

Rank	Model	Source	Score	Year	Paper
1	CLIP4STR-B CLIP4STR-B on Union14M-Benchmark. 70.8% word accuracy. Reported in Union14M paper (arXiv 2307.08723, ICCV 2023) and CLIP4STR paper. Best model on Union14M at time of benchmark publication.	Community	70.8	2026	Source
2	PARSeq PARSeq on Union14M-Benchmark. 67.8% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). Strong ECCV 2022 baseline exposed with real-world difficulty.	Community	67.8	2026	Source
3	LPV-S LPV-S (Language-Guided Progressive Vison, Small) on Union14M-Benchmark. 65.1% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023).	Community	65.1	2026	Source
4	MAERec-S MAERec-S on Union14M-Benchmark. 62.4% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). MAE pre-training for text recognition.	Community	62.4	2026	Source
5	CDistNet CDistNet on Union14M-Benchmark. 56.2% word accuracy. Table 4 in Union14M paper (arXiv 2307.08723, ICCV 2023). AAAI 2022 baseline.	Community	56.2	2026	Source

Submit a Result

Back to Scene Text Detection