swe-bench-verified
Unknown
OCR benchmark
3
Total Results
3
Models Tested
1
Metrics
2025-12-19
Last Updated
resolve-rate
Higher is better
| Rank | Model | Score | Source |
|---|---|---|---|
| 1 | claude-35-sonnet Real-world software engineering issues (verified subset). | 49 | anthropic-blog |
| 2 | gpt-4o | 41.2 | swe-bench-leaderboard |
| 3 | deepseek-v25 | 37 | deepseek-blog |