Adapters · Head-to-Head

The Three Agents

Head-to-head capabilities of the eval framework’s target agents. Ten features, three adapters, one ProcessSpawner interface. Each agent needs its own adapter because no two speak the same protocol.

Capability Matrix

Capability
SubQ Code
Claude Code
Codex CLI
Headless flag
--json / -p
-p
exec --full-auto
Real-time streaming
Yes
Yes
No
System prompt override
Env var
--system-prompt
--instructions
Built-in worktree
Side agents
--worktree
No
Budget cap
No
--max-budget-usd
No
Bare/clean mode
No
--bare
No
Session ID control
Yes
--session-id
Rollout naming
Permission bypass
N/A
bypassPermissions
--full-auto
JSONL output
stdout
stream-json
Disk only
Disk session path
~/.subq/
~/.claude/
~/.codex/sessions/

Adapter Details

SubQ Code Adapter

Pi Agent

subq code --json "$PROMPT"

Real-time JSONL on stdout. Uses SUBQ_SYSTEM_INSTRUCTIONS_FILE for prompt override. Session ID passed directly. Streaming enrichment via Bun.JSONL.parseChunk().

--json real-time env override

Claude Code Adapter

Claude

claude -p "$PROMPT" --output-format stream-json --bare --permission-mode bypassPermissions

The --bare flag is critical: it skips hooks, plugins, and CLAUDE.md, ensuring clean evaluation. ANTHROPIC_API_KEY stripped from env to force Max subscription usage. Post-hoc session cross-reference via --session-id.

--bare stream-json key stripped

Codex CLI Adapter

Codex

codex exec --full-auto "$PROMPT" --json

No real-time streaming—JSONL written to ~/.codex/sessions/ post-hoc. Use proc.exited (not polling) to detect completion. Phase 7c—added after SubQ and Claude adapters prove the pattern.

proc.exited disk JSONL Phase 7c
The --bare flag: Claude Code must run with --bare during evaluation. Without it, user-installed hooks and CLAUDE.md instructions interfere with enrichment—a user’s custom pre-commit hook could cause a false failure that has nothing to do with the agent’s capabilities.

ProcessSpawner Interface

src/eval/adapters/spawner.ts — injectable interface
interface ProcessSpawner
spawn (cmd: string[], opts: SpawnOpts) → SpawnedProcess
stdout ReadableStream<Uint8Array> — incremental consumption
exited Promise<number> — resolves with exit code
signal AbortSignal — timeout via AbortController + setTimeout
-----
note Bun.spawn() not available in Vitest — inject for testability