SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan

§ 01 · Benchmark results

No benchmark results recorded for this paper yet.

View:

MetricSort byDirectionSorted instantly in-page

Results

SOTA rows

Models

Datasets

Benchmark results referencing this paper haven’t been added to the registry yet. If you have a reproduction, submit it →

Benchmark trail

§ 03 · Datasets