Decision guideUpdated April 2026

Best agent for SWE-Bench — April 2026

No single winner. The right answer depends on whether you optimize for quality, cost, or open-source. Here is the current leaderboard, the Pareto frontier, and a flowchart to pick yours.

#1 Quality

Claude Code + Opus 4.7

~87.6% SWE-Bench Verified via Anthropic's internal harness. Highest score anywhere.

Cost: ~$6 per full eval run.

Best $ / %

Claude Code + Sonnet 4.5

77.2% at ~$1.3. Approx 4x cheaper than Opus with only 4 pts less.

Most teams default here.

#1 Open-Weight

MiniMax M2.5 (self-host)

80.2% via mini-SWE-agent. First open model over 80% with no vendor lock-in.

Runs on 4xH100.

Top 15 leaderboard — all agent/model combos

Best reported pass@1

SWE-Bench Verified, April 2026

Closed modelOpen weightsAgent scaffold
0%19%38%57%76%95%Claude Code + Opus 4.787.6%Codex CLI + GPT-5.3-Codex xhigh85.0%Claude Code + Opus 4.580.9%MiniMax M2.5 (mini-SWE-agent)80.2%GPT-5.2 (mini-SWE-agent)80.0%OpenHands + Opus 4.577.6%GLM-5 (mini-SWE-agent)77.8%Gemini 3 Pro (mini)77.4%Claude Code + Sonnet 4.577.2%Kimi K2.5 (mini)76.8%DeepSeek R1 (mini)76.3%Qwen3-Max-Thinking (mini)75.3%Cursor Composer 2 + Opus 4.780.0%Cursor Composer 2 (native, multilingual)73.7%SWE-agent + Opus 4.572.0%

Pareto frontier — quality at the cheapest price

Points on the pink line are not strictly dominated — every point inside is worse on at least one axis. Pick one on the frontier and you are making an intentional trade.

The money visual

SWE-Bench Verified cost/perf, April 2026

X: $ per resolved issue (log scale). Y: Verified %. Pink line = Pareto frontier.

0%20%40%60%80%100%$0.10$1$10$100Cost per resolved issue (USD, log)SWE-Bench Verified (%)Claude Code + Opus 4.7Codex + GPT-5.3-CodexClaude Code + Opus 4.5Claude Code + Sonnet 4.5Claude Code + Haiku 4.5MiniMax M2.5DeepSeek R1GLM-5GPT-5.2Gemini 3 FlashOpenHands + Opus 4.5Devin v1.5Aider + DeepSeek V3.2
Closed modelOpen weightsAgent scaffoldPareto frontier

Decision tree

Pick the question that matters most. Leaves are recommendations for April 2026.

Decision guide

Which agent fits your workflow?

qualitycostopen sourceyesnoyesnoprodresearchWhat matters most?Willing to pay premium?Can self-host?Research or prod?#1 QUALITYClaude Code + Opus 4.7BALANCEDClaude Code + Opus 4.5#1 OPEN VALUEMiniMax M2.5 self-hostBEST $/%Claude Code + Sonnet 4.5PROD OSSOpenHands + Opus 4.5RESEARCHSWE-agent (paper baseline)

Quick tradeoffs

If you...Pick thisWhy
Care only about the scoreClaude Code + Opus 4.7Highest reported Verified % as of April 2026
Need best $/%Claude Code + Sonnet 4.577% at $1.30 is the sweet spot
Want all-open-sourceMiniMax M2.5 or OpenHands + DeepSeek V3.280% open, self-hostable
Want hands-off overnight ticketsDevinOnly one that runs unsupervised for hours
Live in an IDECursor Composer 2Best-in-class IDE integration
Are writing a papermini-SWE-agent + your model100 LOC, fully reproducible baseline
Run many cheap tasksCodex CLI + GPT-5.2 MiniCheapest Codex tier that still solves real bugs
Want MCP toolingClaude CodeLargest MCP ecosystem

Related