Deep Dives
Data pipeline from task YAML to comparison report. Orchestrator, adapters, enrichers, normalizer—plus the full type lattice.
Head-to-head capability matrix. 10 features, 3 agents, verdict badges. Why each adapter exists.
Solve et coagula. Five-stage self-evolution cycle: execute, analyze, build, evaluate, select. The hero page.
EvoSkill, Hermes, GEPA, LangChain harness engineering, and 7 more projects that shaped this framework.
The complete 1,182-line specification: architecture, types, CLI commands, security model, and all 12 reviewer findings.
The Build Sequence
Seven phases, 14 modules, 15+ types, 3 agents—from foundation types to self-evolving prompt optimization. Phase 1 is built and tested. The remaining six phases lay a path from adapter integration through the Hermetic Loop.
Phase 1 · Foundation
Types, Tasks, Worktrees, Cost
Core type system with TypeBox validation. Task loader for YAML definitions. Git worktree allocator with sequential creation. Cost tracking per-agent.
Phase 2–3 · Adapters & Enrichment
Process Spawning & JSONL Parsing
One adapter per agent manages process lifecycle. Enrichers parse RawAgentOutput into canonical EnrichedEvent[] via Bun.JSONL.parseChunk().
Phase 4–6 · Orchestration & Analysis
CLI, Comparison, Reporting
Commander.js wiring into TOP_LEVEL_COMMANDS. Milestone alignment, quality rubric, LLM judge. EvalReport with --robot JSON output.
Phase 7a–c · Self-Evolution
The Hermetic Loop
EvoSkill three-agent loop + GEPA prompt mutation. Frontier-based selection on git branches. Tiered optimization: skills → tools → prompts.
Sample Run
CLI Commands
Design Principles
Three-Knob Model
System prompt, tools, and middleware—the only levers that matter. Each tunable independently, each measurable.
Worktree Isolation
Every agent gets a dedicated git worktree. Sequential allocation avoids .git/config.lock race. Clean state per run.
Trace Auditing
Every eval run captures full JSONL transcripts. Anti-gaming heuristics detect benchmark cheating. Scores without traces are worthless.
Env Allowlist
Child processes receive only approved environment variables. No API key leakage across agent boundaries. TypeBox validates all input.