The Complete
Speech AI Benchmark
Compare the best models for both Speech-to-Text (STT) and Text-to-Speech (TTS). From Whisper to ElevenLabs, see who leads the charts.
Benchmark Stats
Speech-to-Text (STT)
Word Error Rate (WER)
WER measures the percentage of words incorrectly transcribed. It counts three types of errors:
Substitutions
Wrong word: "the cat" becomes "the car"
Deletions
Missing word: "the big cat" becomes "the cat"
Insertions
Extra word: "the cat" becomes "the big cat"
from jiwer import wer
reference = "the quick brown fox"
hypothesis = "the quik brown cat"
error_rate = wer(reference, hypothesis)
print("WER:", round(error_rate * 100, 1), "%") STT Leaderboard
WER on LibriSpeech test-clean. Lower is better.
| Rank | Model | WER (%) | Type | Year |
|---|---|---|---|---|
| #1 | Conformer XL Google | 2.0 | Research | 2021 |
| #2 | Whisper Large v3 OpenAI | 2.7 | Open Source | 2024 |
| #3 | Google USM Google | 2.8 | Cloud API | 2023 |
| #4 | Azure Speech Microsoft | 3.0 | Cloud API | 2024 |
| #5 | Whisper Medium OpenAI | 3.4 | Open Source | 2023 |
| #6 | wav2vec 2.0 Meta | 3.8 | Open Source | 2020 |
STT Datasets
Text-to-Speech (TTS)
Mean Opinion Score (MOS)
TTS is harder to evaluate objectively than STT. The gold standard is MOS: human raters listen to generated audio and rate it from 1 (Bad) to 5 (Excellent).
Other TTS Metrics
- • MCD (Mel Cepstral Distortion)
Objective distance between generated and reference audio. Lower is better.
- • Latency (Time-to-First-Byte)
Critical for voice bots. Best models achieve < 200ms.
- • Word Accuracy
Does it skip words or hallucinate? Checked via STT on output.
TTS Leaderboard
Approximate MOS ratings based on community benchmarks and paper results. Higher is better.
| Rank | Model | MOS (1-5) | Type | Year |
|---|---|---|---|---|
| #1 | ElevenLabs Turbo v2.5 ElevenLabs | 4.8 | Cloud API | 2024 |
| #2 | OpenAI TTS HD OpenAI | 4.7 | Cloud API | 2023 |
| #3 | XTTS v2 Coqui | 4.5 | Open Source | 2024 |
| #4 | MMS-TTS Meta | 4.0 | Open Source | 2023 |
| #5 | Bark Suno | 3.9 | Open Source | 2023 |
| #6 | Piper Rhasspy | 3.6 | Open Source | 2023 |