Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Papers · Agentic AI2023-10-10 · ICLR 2024
Paper

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
arXiv ↗
§ 01 · Benchmark results

No benchmark results recorded for this paper yet.

View:
Sorted instantly in-page
Results
0
SOTA rows
0
Models
0
Datasets
1

Benchmark results referencing this paper haven’t been added to the registry yet. If you have a reproduction, submit it →

Benchmark trail
§ 03 · Datasets

1 dataset from this paper.

evaluates · Computer Code
SWE-bench
Code Generation
Read next

Three places to go from here.

Index
All papers
All tracked papers in the registry, with benchmark result, model, and leaderboard linkage where available.
Replacement
Papers with Code is dead — alternatives
What replaced PWC for each use case: LLMs, OCR, speech, vision, robotics.
Top hub
Agentic AI
Every benchmark in Agentic AI.