The Loop
Five stages, endlessly cycling. The eval framework runs agents on tasks, analyzes their failures, proposes improvements, builds new skills or prompt patches, evaluates the result, and selects winners for the frontier. Each iteration dissolves a weakness and coagulates a strength.
Tiered Optimization
Four tiers of optimization targets, ordered by risk. The loop starts with the safest tier and escalates only when lower tiers plateau.
Tier 1 — Skills
Lowest risk, purely additive. New SKILL.md files that teach the agent patterns: search persistence, verification-first, error recovery. No existing config modified.
Tier 2 — Tool Allow/Deny Lists
Reversible. Constrain or expand which tools the agent can use. Enable grep for search-heavy tasks, deny write for analysis-only tasks.
Tier 3 — System Prompt Sections
Moderate risk. Modify system prompt segments. Cache impact—prompt changes invalidate cached prefixes. Each change must prove its value against cache cost.
Tier 4 — Prompt Ordering
Highest risk. Reorder prompt sections for cache efficiency. A single wrong reordering can degrade cache hit rate and increase latency across all tasks.
Frontier Visualization
Each evolution iteration produces a configuration. Winners are selected by the frontier mechanism and persisted as git branches. The lineage traces back through parent configurations.
Iteration 1
Baseline
Original configuration, no modifications. Score: 0.62
Iteration 3
Search Persistence
Gained search persistence skill. Score: 0.71 (+0.09)
Iteration 5
Prompt Recovery
Improved error recovery prompt. Score: 0.78 (+0.07)
Feedback Memory
The Proposer remembers what was tried before. Circular proposals are rejected. Only novel approaches or extensions of successful ones proceed.
Guardrails
Lineage
EvoSkill V1
Sentient AGI
Three-agent evolutionary loop: Executor, Proposer, Skill Builder. Frontier-based selection on git branches. Cross-agent skill transferability—skills evolved for one agent transfer zero-shot to others.
Hermes Self-Evolution
Nous Research
DSPy + GEPA (Genetic-Pareto Prompt Evolution, ICLR 2026 Oral) for reflective prompt mutation. Reads execution traces to understand why things fail. Four-tier optimization targets.