Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Benchmark · WildASRHome/Leaderboards/Audio & Speech/Automatic Speech Recognition/WildASR
Unknown

WildASR.

Multilingual (English, Chinese, Japanese, Korean) diagnostic benchmark evaluating ASR robustness across three out-of-distribution dimensions: environmental degradation (reverberation, noise, clipping), demographic shift (accents, children, older speakers), and linguistic diversity (code-switching, short utterances, incomplete speech). Uses WER for English and CER for CJK languages.

Paper Leaderboard Lineage
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Cer

Cer is the reported evaluation metric for WildASR. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for Cerverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksEdit
01Gemini 3 Pro
FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.
verified6.102025Source ↗Edit result
02GPT-4o Transcribe
FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.
verified6.402025Source ↗Edit result
03Gemini 2.5 Pro
FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.
verified6.702025Source ↗Edit result
04Whisper Large V3
FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.
verified7.502025Source ↗Edit result
05Scribe V1
FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.
verified8.702025Source ↗Edit result
06Qwen2-Audio
FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.
verified9.102025Source ↗Edit result
07Nova 2
FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.
verified10.12025Source ↗Edit result

Wer

Wer is the reported evaluation metric for WildASR. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for Werverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksEdit
01Gemini 3 Pro
FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.
verified2.802025Source ↗Edit result
02GPT-4o Transcribe
FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.
verified2.802025Source ↗Edit result
03Gemini 2.5 Pro
FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.
verified3.602025Source ↗Edit result
04Scribe V1
FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.
verified3.602025Source ↗Edit result
05Whisper Large V3
FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.
verified4.202025Source ↗Edit result
06Qwen2-Audio
FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.
verified5.802025Source ↗Edit result
07Nova 2
FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.
verified6.002025Source ↗Edit result
Lineage

WildASR in context.

See full speech recognition benchmarks lineage →
Predecessors (1)
active2022-05
FLEURS
FLEURS evaluates multilingual generalisation; WildASR evaluates naturalness — real ambient noise, spontaneous speech, code-switching, and domain diversity. The current attention path for foundation-model ASR evaluation.
This benchmark (1)
active2024-01
WildASR
None yet — this is the current frontier.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Automatic Speech Recognition