The Claude↔Codex Handoff Pattern I've Been Running for Months

Most people run Claude and Codex as alternatives. You ask one or the other. One model’s perspective, one model’s blind spots, and one model both writing the implementation and reviewing it.

That last part is the problem. The model that specifies the work shouldn’t be the only one validating the result.

Here’s the coordination architecture I’ve been running.

The Core Loop: Specify → Implement → Validate

Three phases. Each model does what it’s actually good at.

Claude specifies. Reads the design docs, breaks work into atomic tasks, writes a self-contained implementation prompt: requirements, acceptance criteria, constraints. Codex starts fresh with no conversational memory, so the prompt has to carry all the context.

Codex implements. Reads AGENTS.md from the project root for conventions, constraints, and build commands. Executes in your working directory. Returns JSONL output.

Claude validates. git diff, test run, acceptance criteria check. Either commits or refines the prompt and re-delegates (circuit breaker at 3x, more below).

Claude: SPECIFY
→ self-contained prompt, acceptance criteria, constraints

              Codex: IMPLEMENT
              → reads AGENTS.md for project context
              → executes in working directory

Claude: VALIDATE
→ git diff, tests, acceptance criteria
✓ commit  /  ✗ refine, retry (max 3x)

Three Principles That Make It Work

1. Generator-Evaluator Separation

Codex generates. Claude evaluates. Keep the roles distinct.

I find Claude Opus 4.6 superior as an ideating, planning, and speccing model, strong at architectural reasoning, spec compliance, and knowing when something is wrong. Codex 5.4 excels at following instructions and doing so fast and efficiently. Don’t ask Claude to do high-volume implementation. Don’t ask Codex to review architectural compliance. It has no context for what you specified. The separation is the whole point.

2. AGENTS.md as Shared Context

Both runtimes need shared context. The problem: Claude reads CLAUDE.md, Codex reads AGENTS.md. Different files.

The solution: AGENTS.md is an open standard read natively by Codex, Cursor, Copilot, and most AI coding tools. I use it as the canonical source. Conventions, constraints, build commands live there. CLAUDE.md imports it rather than duplicating it. Keeping agent harnesses in sync is real work; making one file authoritative eliminates the sync problem.

Dynamic state (what’s done, what’s next, decisions made) lives in shared files both runtimes can read: status.md, task lists, decision logs. State lives in files, not model memory.

3. Artifact-Driven Handoff

State in files, not context windows.

Handoffs survive /compact and context resets
Either model can pick up where the other left off
Long gaps between sessions don’t lose state
The handoff protocol is auditable (it’s a markdown block in status.md)

Circuit Breakers

This is the part most people skip when building autonomous loops. It’s where things go wrong.

3 consecutive Codex failures on the same task → stop. Fix inline or escalate to human. Don’t let an autonomous loop iterate on a broken approach. If you’re not converging after 3 attempts, the spec is wrong, not the implementation.

Without that bound, you burn tokens iterating on something broken. With it, you get a natural escalation path back.

The Setup

Two files in your project root:

AGENTS.md is your project’s conventions, constraints, build commands. Codex reads this automatically. The canonical source both runtimes work from.

CLAUDE.md imports AGENTS.md with @AGENTS.md, plus Claude’s operating instructions: how to specify tasks, validate results, maintain state files, commit with attribution.

Two shared state files:

status.md tracks active work, what’s done, any blockers.

BACKLOG.md holds what’s next and priorities.

Both runtimes read these. That’s the full shared context layer.

The Plugin

This is a Claude Code plugin: github.com/jessepike/claude-codex-handoff. Clone the repo, run claude plugin install, and /codex-delegate becomes a slash command inside Claude Code that implements the full specify→implement→validate loop.

You bring the project context. Fill in the AGENTS.md and CLAUDE.md templates for your project, and the loop is grounded in your actual codebase. The repo includes both templates, the codex-delegate skill, and a full architecture doc in patterns/.

Cost model: Claude Max + GPT Pro. This loop costs nothing extra. Codex runs against your GPT Pro subscription, Claude against your Claude Max subscription.

If you’re running a variation or have a different take, open a discussion on the repo.