Audio & Speech

ASR, TTS, speaker intelligence, music, sound events, audio-language understanding, and audio safety.

17 tasks7 datasets

Explore All Results

Tasks in Audio & Speech

Automatic Speech Recognition

Converting spoken audio to text.

2 datasets

View →

Multilingual ASR

Recognizing speech across languages and accents.

0 datasets

View →

Streaming ASR

Low-latency speech recognition on live audio.

0 datasets

View →

Speech Translation

Translating spoken audio directly to another language.

0 datasets

View →

Text-to-Speech

Generating natural-sounding speech from text.

3 datasets

View →

Expressive TTS

Generating speech with controllable prosody and emotion.

0 datasets

View →

Voice Cloning

Replicating speaker characteristics from examples.

0 datasets

View →

Speaker Verification

Verifying speaker identity from voice samples.

0 datasets

View →

Speaker Diarization

Separating who spoke when in multi-speaker audio.

0 datasets

View →

Speech Emotion Recognition

Classifying emotion or affect from speech.

0 datasets

View →

Audio Classification

Classifying audio clips by event or category.

2 datasets

View →

Sound Event Detection

Detecting sound events over time.

0 datasets

View →

Audio Captioning

Generating text descriptions of audio clips.

0 datasets

View →

Audio Question Answering

Answering questions about audio content.

0 datasets

View →

Music Understanding

Analyzing musical structure, genre, or content.

0 datasets

View →

Music Generation

Generating music from text, prompts, or examples.

0 datasets

View →

Audio Deepfake Detection

Detecting synthetic or manipulated speech.

0 datasets

View →

Explore Other Areas

Language & Knowledge

Language understanding, retrieval, QA, RAG, factuality, information extraction, multilingual evaluation, and knowledge-heavy reasoning.

Vision & Documents

Images, video frames, OCR, layout, tables, document parsing, detection, segmentation, and visual anomaly detection.

Multimodal Media

Cross-modal image, text, audio, video, and 3D tasks where input and output span multiple media types.

Code & Software Engineering

Code generation, completion, repair, repository understanding, tests, vulnerability work, UI code, and mobile app code generation.