Decision guideUpdated April 2026

Best agent for SWE-Bench — April 2026

No single winner. The right answer depends on whether you optimize for quality, cost, or open-source. Here is the current leaderboard, the Pareto frontier, and a flowchart to pick yours.

#1 Quality

Claude Code + Opus 4.7

~87.6% SWE-Bench Verified via Anthropic's internal harness. Highest score anywhere.

Cost: ~$6 per full eval run.

Best $ / %

Claude Code + Sonnet 4.5

77.2% at ~$1.3. Approx 4x cheaper than Opus with only 4 pts less.

Most teams default here.

#1 Open-Weight

MiniMax M2.5 (self-host)

80.2% via mini-SWE-agent. First open model over 80% with no vendor lock-in.

Runs on 4xH100.

Top 15 leaderboard — all agent/model combos

Best reported pass@1

SWE-Bench Verified, April 2026

Closed modelOpen weightsAgent scaffold

Pareto frontier — quality at the cheapest price

Points on the pink line are not strictly dominated — every point inside is worse on at least one axis. Pick one on the frontier and you are making an intentional trade.

The money visual

SWE-Bench Verified cost/perf, April 2026

X: $ per resolved issue (log scale). Y: Verified %. Pink line = Pareto frontier.

Closed modelOpen weightsAgent scaffoldPareto frontier

Decision tree

Pick the question that matters most. Leaves are recommendations for April 2026.

Decision guide

Which agent fits your workflow?

Quick tradeoffs

If you...	Pick this	Why
Care only about the score	Claude Code + Opus 4.7	Highest reported Verified % as of April 2026
Need best $/%	Claude Code + Sonnet 4.5	77% at $1.30 is the sweet spot
Want all-open-source	MiniMax M2.5 or OpenHands + DeepSeek V3.2	80% open, self-hostable
Want hands-off overnight tickets	Devin	Only one that runs unsupervised for hours
Live in an IDE	Cursor Composer 2	Best-in-class IDE integration
Are writing a paper	mini-SWE-agent + your model	100 LOC, fully reproducible baseline
Run many cheap tasks	Codex CLI + GPT-5.2 Mini	Cheapest Codex tier that still solves real bugs
Want MCP tooling	Claude Code	Largest MCP ecosystem

Best agent for SWE-Bench — April 2026

Claude Code + Opus 4.7

Claude Code + Sonnet 4.5

MiniMax M2.5 (self-host)

Top 15 leaderboard — all agent/model combos

SWE-Bench Verified, April 2026

Pareto frontier — quality at the cheapest price

SWE-Bench Verified cost/perf, April 2026

Decision tree

Which agent fits your workflow?

Quick tradeoffs

Related

Claude Code vs Cursor Composer

Claude Code vs Codex CLI

Devin vs Claude Code

Aider vs Claude Code

OpenHands vs SWE-agent

SWE-Bench explained

Agentic coding landscape

SWE-Bench hub