Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Audio-Language ModelsHome/Tasks/Audio/Audio-Language Models

Audio-Language Models.

Audio-Language Models (ALMs) are a form of artificial intelligence that extend natural language processing (NLP) to the domain of audio, enabling computers to understand, generate, and reason about sounds and speech by integrating audio data with language understanding. Trained on audio-text data, ALMs bridge the gap between acoustic signals and linguistic meaning, allowing for tasks like zero-shot audio recognition, audio captioning, and the creation of generative audio, such as text-to-audio synthesis.

18
Datasets
0
Results
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

Seeking canonical benchmark for this task.

Suggest one →
§ 03 · Top 10

Leading models.

Leading models across all datasets in this task.

No results yet. Be the first to contribute.

What were you looking for on Audio-Language Models?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

18 datasets tracked for this task.

AudioCaps
0 results
CMM hallucination
0 results
Clotho-AQA
0 results
Clotho-v2
0 results
CochlScene
0 results
CompA-R-test
0 results
IEMOCAP
0 results
LibriSQA
0 results
LongAudioBench
0 results
MMAR
0 results
MMAU
0 results
MMSU
0 results
MuchoMusic
0 results
Music Instruct
0 results
MusicAVQA
0 results
NSynth
0 results
NonSpeech7k
0 results
OpenAudioBench - LlamaQuestions
0 results
§ 05 · Related tasks

Other tasks in Audio.

Audio ClassificationAutomatic Speech RecognitionText-to-speechVoice cloning
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Audio-Language Models? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.