Recent Papers / arXiv:2606.05177
MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models
Authors pending
Abstract
1196 scenarios across vision, audio, and text; current Omni LLMs fail to integrate cross-modal cues for safety judgments, performing better only when salient signals are present.
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
- MCBench cross-modal safety accuracy breakdown by modality combination and risk category