Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Voice cloningHome/Tasks/Audio/Voice cloning

Voice cloning.

Voice cloning is a type of audio deepfake technology that uses machine learning to create a digital replica of a specific person's voice, synthesizing spoken audio that mimics their vocal characteristics like pitch and tone. While it has positive uses, such as generating audiobooks or helping people who have lost their voice, it is also used for malicious purposes, including creating convincing scams where fraudsters impersonate individuals.

1
Datasets
3
Results
wer
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

LibriTTS test-clean (Zero-Shot TTS)

Standard zero-shot voice-cloning / TTS evaluation using LibriTTS test-clean speaker prompts. WER on resynthesized utterances (measured with a frozen ASR like HuBERT-Large or Whisper) is the primary intelligibility metric (lower=better); speaker similarity (SECS) is a secondary metric.

Primary metric: wer
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on LibriTTS test-clean (Zero-Shot TTS).

#ModelwerYearSource
VALL-E5.902026paper ↗
2Voicebox1.902026paper ↗
3NaturalSpeech 31.812026paper ↗

What were you looking for on Voice cloning?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

LibriTTS test-clean (Zero-Shot TTS)
CANONICAL
3 results · wer
Top: VALL-E 5.90
§ 05 · Related tasks

Other tasks in Audio.

Audio ClassificationAudio-Language ModelsAutomatic Speech RecognitionText-to-speech
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Voice cloning? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.