Recent Papers / arXiv:2410.03859
SWE-bench Multimodal: Can AI Agents Handle Visual Issues?
John Yang, Carlos E. Jimenez, et al.
Tasks
editResults
No benchmark results recorded yet.
Benchmark results referencing this paper haven't been added to the registry yet. If you have a reproduction, submit it →
CodeSOTA extraction
Benchmark evidence
Link this paper to benchmark rows, datasets, model cards, and reproduced results as evidence is extracted.