Codesota · Speech · Open-source TTSSpeech/Best open-source TTSUpdated May 13, 2026
§ Open models

Best open-source TTS models.

Start with Sesame CSM when naturalness matters, Kokoro v1.0 when you need a small local model, XTTS v2 for voice cloning, and Piper when CPU latency matters more than polish.

§ 01 · Ranking

Open-source TTS shortlist

Ranked by the shared CodeSOTA TTS catalog. MOS is a starting signal; deployment choice depends on license, footprint, languages, inference speed and whether voice cloning is central.

RankModelVendorMOSBest fitArchitectureParamsSource
1
Sesame CSM
2025 · Subjective
Sesame4.7dialogue and agentsConversational Speech Model1B+source
2
Fish Audio S2 Pro
2026 · Subjective (ARXIV:2603.08823)
Fish Audio4.6multilingual appsDual-autoregressive transformer + RVQ audio codec5Bsource
3
Orpheus TTS
2025 · Subjective
Canopy Labs4.6style controlLLM-based (Llama backbone)3Bsource
4
Kokoro v1.0
2025 · Subjective
Hexgrad4.5edge and CPULightweight autoregressive82Msource
5
XTTS v2
2024 · VCTK / Subjective
Coqui4.5voice cloningGPT-like + VITS decoder467Msource
6
Fish Speech 1.5
2025 · Subjective
Fish Audio4.4multilingual appsVQGAN + Transformer500Msource
7
F5-TTS
2024 · Subjective
Shanghai AI Lab4.4voice cloningFlow-matching (non-autoregressive)335Msource
8
Dia 1.6B
2025 · Subjective
Nari Labs4.3dialogue and agentsTransformer + non-verbal tokens1.6Bsource
9
Spark-TTS
2025 · Subjective
SparkAudio4.3multilingual appsControllable Transformer500Msource
10
Supertonic 3
2026 · Subjective
Supertone4.2local TTSONNX Runtime local inference99Msource
11
Parler-TTS
2025 · Subjective
Hugging Face4.1local TTSPrompt-controlled Transformer880Msource
12
Piper
2023 · Subjective / Raspberry Pi
Rhasspy3.6edge and CPUVITS (lightweight)~20Msource

Choose open-source when

You need local inference, private audio handling, predictable marginal cost, model-level control, or custom fine-tuning.

Choose API when

You need managed voices, streaming infrastructure, uptime, commercial voice-cloning flows, or the fastest path to a production voice agent.

Validate before shipping

Run names, numbers, dates, URLs, acronyms, and domain terms through your exact prompts. Naturalness does not guarantee information fidelity.

Related CodeSOTA pages

Full TTS registry

Hosted APIs and open-source models with evidence tiers.

TTS models guide

Long-form guide to model families and deployment choices.

TTS intelligibility benchmark

Hard-prompt WER, entity preservation and latency.

Best TTS for real-time

Latency-first choices for voice agents.