Name: Handoffcodex
Author: wolfiesch
Execute the following task by orchestrating OpenAI Codex CLI as a sub-agent:

**Task/Plan**: @$1

## Handoff Protocol

Follow this workflow to delegate the task to Codex while maintaining oversight:

### 0. **Assess Task Size**

Before proceeding, evaluate the task scope:

| Scope | Files | Action |
|-------|-------|--------|
| Too small | 1-2 files | Skip handoff, do directly |
| Single agent | 3-15 files, single concern | Continue with standard handoff |
| Parallel agents | 15+ files OR multiple independent concerns | Use parallel execution (see below) |

**Independence test for parallel:** Would subtasks cause merge conflicts if run simultaneously?
- No conflicts → Split into parallel agents
- Conflicts likely → Single agent or sequential

### 1. **Prepare the Handoff**

First, create a detailed task specification:

- If `$1` is a plan file, read it and extract key requirements
- If `$1` is a task description, expand it with:
  - Clear acceptance criteria
  - Files likely to be modified
  - Expected behavior/outputs
  - Testing requirements

Create a consolidated task brief at `/tmp/codex_task.md` with:
```markdown
# Task: [Brief Title]

## Objective
[What needs to be accomplished]

## Requirements
- Specific requirement 1
- Specific requirement 2
- ...

## Files to Modify/Create
- path/to/file1.ts
- path/to/file2.tsx

## Acceptance Criteria
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] All tests passing

## Context
[Any relevant context about the codebase, patterns to follow, etc.]
```

### 2. **Launch Codex Sub-Agent**

#### Standard (Single Agent) Execution

```bash
# Record start time for tracking
CODEX_START=$(date +%s)
TASK_NAME="$(head -1 /tmp/codex_task.md | sed 's/# Task: //')"

codex exec -m gpt-5.1-codex-max --json --full-auto --skip-git-repo-check \
  "$(cat /tmp/codex_task.md)

Follow project conventions from CLAUDE.md if available.
Run tests after implementation.
Provide a structured summary of:
- Files created/modified
- Tests run and results
- Any issues encountered
- What still needs review" \
  2>/dev/null | tee /tmp/codex_handoff_output.log

CODEX_EXIT=$?
CODEX_END=$(date +%s)
```

#### Parallel Agent Execution

For large tasks with independent subtasks, launch multiple agents:

```bash
# Create subtask specs (example: splitting by layer)
cat > /tmp/codex_task_api.md << 'EOF'
# Subtask: API Layer
[API-specific requirements from main task]
EOF

cat > /tmp/codex_task_service.md << 'EOF'
# Subtask: Service Layer
[Service-specific requirements from main task]
EOF

cat > /tmp/codex_task_tests.md << 'EOF'
# Subtask: Test Coverage
[Test-specific requirements from main task]
EOF

# Launch parallel agents
codex exec -m gpt-5.1-codex-max --json --full-auto --skip-git-repo-check \
  "$(cat /tmp/codex_task_api.md)" 2>/dev/null > /tmp/codex_api.log &
PID_API=$!

codex exec -m gpt-5.1-codex-max --json --full-auto --skip-git-repo-check \
  "$(cat /tmp/codex_task_service.md)" 2>/dev/null > /tmp/codex_service.log &
PID_SERVICE=$!

codex exec -m gpt-5.1-codex-max --json --full-auto --skip-git-repo-check \
  "$(cat /tmp/codex_task_tests.md)" 2>/dev/null > /tmp/codex_tests.log &
PID_TESTS=$!

# Wait for all agents
echo "Waiting for parallel agents: API($PID_API) Service($PID_SERVICE) Tests($PID_TESTS)"
wait $PID_API; EXIT_API=$?
wait $PID_SERVICE; EXIT_SERVICE=$?
wait $PID_TESTS; EXIT_TESTS=$?

echo "Agent results: API=$EXIT_API, Service=$EXIT_SERVICE, Tests=$EXIT_TESTS"
```

### 3. **Monitor Progress**

While Codex executes (runs in its own context, saving your tokens):

- Check progress periodically: `tail -20 /tmp/codex_handoff_output.log`
- For parallel: `tail -5 /tmp/codex_*.log`
- Watch for errors or completion signals

**DO NOT** attempt to do the implementation yourself - let Codex work autonomously.

### 4. **Quality Gates**

Run mandatory checks before presenting results. **All gates must pass or be explicitly acknowledged.**

```bash
echo "=== Running Quality Gates ==="
GATE_FAILURES=0

# Gate 1: Syntax/Lint Check
echo "Gate 1: Lint..."
if command -v make &>/dev/null && make lint 2>/dev/null; then
  echo "  PASS: Lint"
else
  echo "  FAIL: Lint errors detected"
  ((GATE_FAILURES++))
fi

# Gate 2: Type Check (if applicable)
echo "Gate 2: Types..."
if [ -f "tsconfig.json" ]; then
  if npx tsc --noEmit 2>/dev/null; then
    echo "  PASS: TypeScript"
  else
    echo "  FAIL: Type errors"
    ((GATE_FAILURES++))
  fi
elif [ -f "pyproject.toml" ] || [ -f "setup.py" ]; then
  if command -v mypy &>/dev/null && mypy . 2>/dev/null; then
    echo "  PASS: mypy"
  else
    echo "  WARN: mypy issues (non-blocking)"
  fi
else
  echo "  SKIP: No type system detected"
fi

# Gate 3: Tests
echo "Gate 3: Tests..."
if command -v make &>/dev/null && make test 2>/dev/null; then
  echo "  PASS: Tests"
else
  echo "  FAIL: Test failures"
  ((GATE_FAILURES++))
fi

# Gate 4: Git Status (no unintended changes)
echo "Gate 4: Scope check..."
CHANGED_FILES=$(git diff --name-only | wc -l | tr -d ' ')
if [ "$CHANGED_FILES" -gt 30 ]; then
  echo "  WARN: $CHANGED_FILES files changed (verify scope)"
else
  echo "  PASS: $CHANGED_FILES files changed"
fi

echo ""
echo "=== Quality Gate Summary ==="
if [ "$GATE_FAILURES" -eq 0 ]; then
  echo "All gates passed"
else
  echo "$GATE_FAILURES gate(s) failed - review required before proceeding"
fi
```

### 5. **Cost Tracking**

After Codex completes, extract and log usage:

```bash
# Extract usage from JSON output
USAGE=$(jq -s '[.[] | select(.type == "usage")] | last // empty' /tmp/codex_handoff_output.log 2>/dev/null)

if [ -n "$USAGE" ] && [ "$USAGE" != "null" ]; then
  INPUT_TOKENS=$(echo "$USAGE" | jq -r '.input_tokens // 0')
  OUTPUT_TOKENS=$(echo "$USAGE" | jq -r '.output_tokens // 0')

  # Approximate cost (codex-max rates: ~$2.5/1M input, ~$17.5/1M output)
  COST=$(echo "scale=4; ($INPUT_TOKENS / 1000000 * 2.5) + ($OUTPUT_TOKENS / 1000000 * 17.5)" | bc 2>/dev/null || echo "N/A")
  DURATION=$((CODEX_END - CODEX_START))

  echo ""
  echo "=== Codex Usage ==="
  echo "Duration: ${DURATION}s"
  echo "Tokens: ${INPUT_TOKENS} in / ${OUTPUT_TOKENS} out"
  echo "Est. cost: \$${COST}"

  # Log to CSV for tracking
  echo "$(date -Iseconds),${TASK_NAME:-handoff},${INPUT_TOKENS},${OUTPUT_TOKENS},${COST},${DURATION},${CODEX_EXIT}" >> ~/.codex_cost_log.csv
fi
```

### 6. **Review Results**

Once Codex completes and quality gates run:

1. **Parse the output** for the structured summary
2. **Review changes** made by Codex:
   ```bash
   git status
   git diff --stat
   git diff  # Full diff for detailed review
   ```
3. **Check quality gate results** from step 4
4. **Create summary** for the user including:
   - What Codex successfully completed
   - Quality gate results (pass/fail/warn)
   - Token usage and cost (from step 5)
   - What needs attention/review
   - What remains to be done (if any)
   - Suggested next steps

### 7. **Integration & Cleanup**

- If all gates pass and changes look good: `git add [files]`
- If gates failed: List specific fixes needed
- Clean up temp files: `rm /tmp/codex_task*.md /tmp/codex_*.log`

## Benefits of This Handoff

- **Token Efficiency**: Codex uses its own context for implementation
- **Specialized Execution**: Leverage Codex's code generation strengths
- **Parallel Work**: Multiple Codex agents for independent subtasks
- **Quality Assurance**: Automated gates catch issues before user review
- **Cost Visibility**: Track token usage and costs per task

## When to Use This Command

**Use for:**
- Implementation of detailed plans (3-15 files)
- Large-scale refactoring tasks (use parallel for 15+)
- Code generation from specs
- Repetitive modifications across many files
- Tasks that can run autonomously without user input

**Don't use for:**
- 1-2 file changes (do directly)
- Tasks requiring frequent user clarification
- Complex architectural decisions
- Highly context-dependent debugging

## Error Handling

If Codex encounters issues:
1. Check exit code: `echo $CODEX_EXIT` (0 = success)
2. Check logs: `tail -50 /tmp/codex_handoff_output.log`
3. Determine recovery path:
   - Missing context → Provide more detail and restart
   - Quality gate failures → Fix issues manually or re-run
   - Partial completion → Review what succeeded, complete remainder
4. Report status to user with specific issues found

For parallel agent failures:
- Check individual logs: `/tmp/codex_api.log`, etc.
- One failure doesn't block others
- Report which subtasks succeeded/failed

---

**Remember**: This is an orchestration pattern. You (Claude Code) remain responsible for:
- Task decomposition and sizing
- Deciding single vs. parallel execution
- Running and interpreting quality gates
- Reviewing results before presenting to user
- Cost awareness and reporting

Codex is your implementation assistant, not a replacement for your judgment.
Handoffcodex

Quick Install

Details

Used In

More by wolfiesch