Help decide which AI voice actually sounds better.
You will listen to short clips of A, B, and C reading the same text. Your choices help build a public TTS ranking based on human preference, not only automatic scores.
The open-source baseline is audible and measurable.
I rendered Kokoro-82M voices with the same prompt, then extracted pitch, voicing, brightness, zero-crossing, and MFCC views. These descriptors are not a final ranking, but they show where human preference should look closely.
A concrete Kokoro sample.
This is the kind of evidence the study will pair with blind listener votes: a real clip translated into acoustic signals listeners can actually compare.
Join the listener pool.
Leave your email and I will send the first listening rounds when the study opens.
Register for the TTS listening study
You will get an email when the first blind A/B/C rounds are ready.
How the study works.
This is a preference test, not a vendor demo. The same text goes through every TTS system, and the listener only sees neutral labels.
Same text
Every system receives the same prompt, so listeners compare voice quality rather than prompt choice.
Blind labels
Audio clips are shown as A, B, and C. Provider names, model names, and prices stay hidden until after scoring.
Human vote
Listeners pick the clip they prefer and can flag problems like robotic prosody, bad emphasis, noise, or unclear words.
Preference ranking
Votes are aggregated into a human preference layer that sits next to WER, latency, cost, and license data.
Expressiveness needs more than one number.
The composite score is intentionally transparent: it rewards pitch movement, avoids treating whisperiness as a failure by itself, and keeps timbre separate from intelligibility.
Which voices should listeners compare first?
The ranking below is a triage tool for study design. It identifies voices with more acoustic variation so the blind rounds can test whether listeners actually prefer that variation.