Codesota · Agentic · Claude Code vs Codex CLIHome/Agentic/Claude Code vs Codex CLI

Head-to-head · two frontier labs · April 2026

Claude Code vs OpenAI Codex CLI.

Both are terminal agents. Both can run unattended for half an hour. Both ship regularly. One is wrapped around Claude Opus 4.5/4.7; the other around GPT-5.2/5.3-Codex. Here is where they actually diverge.

SWE-Bench hub →Claude Code ↗Codex CLI ↗

§ 01 · Side-by-side

At a glance, row by row.

Attribute	Claude Code	OpenAI Codex CLI
Vendor	Anthropic	OpenAI
Primary model	Opus 4.5 / 4.6 / 4.7, Sonnet 4.5, Haiku 4.5	GPT-5.2 / 5.3, Codex variants, Mini, Nano
Reasoning control	Soft (prompted)	Explicit: minimal / medium / high / xhigh
SWE-Bench Verified peak	87.6% (Opus 4.7)	85% (GPT-5.3-Codex xhigh)
Patch format	str_replace (surgical)	apply_patch (unified diff)
Sandboxing	Host shell + optional --safe	Default Docker sandbox
Extension mechanism	MCP servers (Linear, DB, etc)	custom_tools JSON schema
Context window	200k / 1M (Opus 4.7 1M)	200k (GPT-5) / 400k (Codex long)
Pricing signal	Pro $20/mo + API usage	Pro $20/mo + API usage
Best for	Multi-file refactor, long autonomy	Reasoning-heavy bugs, cheap bulk runs

April 2026, pass@1

SWE-Bench Verified — best reported by each CLI

Closed modelOpen weightsAgent scaffold

Architecture — the two loops side-by-side

Both are plan-act-reflect agents, but the tool verbs and reasoning modes differ.

Architecture

Claude Code

str_replace + bash + MCP

Architecture

OpenAI Codex CLI

apply_patch + sandboxed shell

Cost vs performance

Eval-run $ for a full SWE-Bench Verified pass vs resolve rate. Mini/Haiku tiers are competitive on price-per-score.

The money visual

Claude Code vs Codex CLI — price of a full Verified run

X: $ per resolved issue (log scale). Y: Verified %. Pink line = Pareto frontier.

Closed modelOpen weightsAgent scaffoldPareto frontier

§ 02 · Choose

When to pick which.

Choose Claude Code when

You want the single highest SWE-Bench Verified score available
Task spans many files or repos; long-horizon planning matters
You have custom tools / internal DBs — MCP is a killer feature
You prefer surgical str_replace edits over full-file rewrites

Choose Codex CLI when

Your team is on OpenAI API contracts
You want explicit reasoning knobs per task
You run many cheap bulk tasks — GPT-5.2 Nano is ridiculously cheap
You need sandboxing by default

§ 03 · Method

How the numbers were sourced.

SWE-Bench Verified scores are taken from Anthropic's leaderboard runs (Claude Code) and OpenAI's release notes for the Codex variants of GPT-5.2/5.3. See our SWE-Bench hub for the full leaderboard.

Cost figures reflect a full Verified run at published API rates as of April 2026. Codex's Nano and Mini tiers are priced for bulk; the xhigh reasoning tier carries the same per-token rate but consumes more reasoning tokens.

The pipeline diagrams are editorial — they capture the dominant tool verbs each CLI exposes, not every internal step.

§ 04 · Related