SWE-bench Multimodal: Can AI Agents Handle Visual Issues?

arXiv:2410.03859Submitted Oct 4, 20240 benchmark results

John Yang, Carlos E. Jimenez, et al.

Tasks

Results

No benchmark results recorded yet.

Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →

CodeSOTA extraction

Link this paper to benchmark rows, datasets, model cards, and reproduced results as evidence is extracted.

§ 03 · Datasets

Add or update benchmark results

Logged-in editor · benchmark trail