Fix
Intent
Make risky or unclear code safe with the smallest sound, validated change, then keep listening to the final self-review until no actionable self-review change remains. When the validated changeset survives the post-self-review rerun, $fix stops at that boundary; broader architecture, product, or roadmap analysis belongs to another skill.
Double Diamond fit
fix spans the full Double Diamond, but keeps divergence mostly internal to stay autonomous:
-
Discover: reproduce/characterize + collect counterexamples.
-
Define: state the contract and invariants (before/after).
-
Develop: (internal) consider 2-3 plausible fix strategies; pick the smallest sound one per default policy.
-
Deliver: implement + prove with a real validation signal.
If a fix requires a product-sensitive choice (cannot be derived or characterized safely), stop and ask (or invoke $creative-problem-solver when multiple viable strategies exist).
Skill composition (required)
-
$invariant-ace : default invariant/protocol engine for fix ; run it before invariant-affecting edits.
-
$complexity-mitigator : default complexity analysis engine; use it for risk-driven complexity judgments before reshaping code.
-
$refine : default path when the task is to improve fix itself (or other skills); apply $refine workflow and quick_validate .
-
Delegation order when both are needed: $invariant-ace first, then $complexity-mitigator .
Delegation triggers (deterministic):
-
Run $invariant-ace when the slice touches stateful boundaries (parse/construct/API/DB/lock/txn/retry ordering), ownership/lifetime, or any "should never happen" claim.
-
Run $complexity-mitigator when auditability is at risk (for example: branch soup, deep nesting, cross-file reasoning, or unclear ownership/control flow).
-
Run $refine when the requested target is a skill artifact (for example SKILL.md , agents/openai.yaml , skill scripts/references/assets).
-
If both invariant and complexity triggers fire, run $invariant-ace first, then $complexity-mitigator .
Inputs
-
User request text.
-
Repo state (code + tests + scripts).
-
Validation signal (failing test/repro/log) OR a proof hook you create.
Outputs (chat)
-
Emit the exact sections in Deliverable format (chat) .
-
Use section headings verbatim (for example, Findings (severity order) , not Findings ).
-
During execution, emit one-line pass progress updates at pass start and pass end using: Pass <n>/<total_planned>: <name> — <start|done>; edits=<yes|no|n/a>; signal=<cmd|n/a>; result=<ok|fail|n/a> .
-
In Validation , include a machine-checkable JSON object with keys: baseline_cmd , baseline_result , proof_hook , final_cmd , final_result .
-
Keep self-review transcript shapes aligned with references/self_review_loop_examples.md when editing this contract.
Embedded mode (when $fix is invoked inside another skill)
-
You may emit a compact Fix Record instead of the full deliverable.
-
If another skill requires a primary artifact (for example patch-only diff), append Fix Record immediately after that artifact in the same assistant message.
-
Do not mention “Using $fix” without emitting either the full deliverable or a Fix Record.
Hard rules (MUST / MUST NOT)
-
MUST default to review + implement (unless review-only is requested).
-
MUST triage and present findings in severity order: security > crash > corruption > logic.
-
MUST NOT claim done without a passing validation signal.
-
MUST NOT edit when no local signal/proof hook can be found or created under Validation signal selection ; fail fast and report blocker.
-
MUST NOT do product/feature work.
-
MUST NOT do intentional semantic changes without clarifying, except correctness tightening.
-
MUST resolve trade-offs using Default policy (non-interactive) .
-
MUST emit pass progress updates while running the multi-pass loop.
-
MUST include a final Pass trace section in the deliverable/Fix Record with executed pass count and per-pass outcomes.
-
MUST include machine-checkable validation evidence keys in Validation : baseline_cmd , baseline_result , proof_hook , final_cmd , final_result .
-
MUST use the exact heading names from Deliverable format (chat) / Fix Record ; do not alias or shorten heading labels.
-
MUST produce a complete finding record for every acted-on issue:
-
counterexample
-
invariant_before
-
invariant_after
-
fix
-
proof
-
proof_strength
-
compatibility_impact
-
MUST follow the delegation contract in Skill composition (required) (including order: $invariant-ace -> $complexity-mitigator ).
-
MUST route skill-self edits (for example codex/skills/fix ) through $refine and run quick_validate .
-
MUST NOT put fixable items in Residual risks / open questions ; if it is fixable under the autonomy gate + guardrails, treat it as a finding and fix it.
-
MUST include a final Self-review loop trace section in the deliverable/Fix Record.
-
MUST run the final agent-directed self-review loop only after at least one change is applied and validation is passing.
-
MUST run the self-review loop against the final validated changeset only.
-
MUST invalidate and rerun the self-review loop if any non-self-review edit occurs after a self-review round.
-
MUST ask internally exactly: If you could change one thing about this changeset what would you change?
-
MUST treat the final agent-directed self-review phase as current-worktree scoped once the first validated changeset exists; do not reject a self-review suggestion solely because it broadens the diff.
-
MUST treat a self-review answer as actionable whenever it identifies a concrete compatible/provable improvement on the current validated changeset, even if the improvement is ergonomic, structural, or API-shaping rather than a baseline bug fix.
-
MUST answer that question internally, apply at most one new actionable self-review change per self-round, re-run validation, and repeat until a self-round yields no new actionable self-review change or only blocked changes.
-
MUST NOT reject a self-review suggestion solely because the baseline is already green, the concern sounds architectural, or the change would reshape a public/API seam; if it can stay backward-compatible and be revalidated locally, implement it.
-
MUST, when a self-review critique is broader than the smallest bug repair, apply the narrowest compatible/provable slice that materially addresses the critique before considering broader follow-up skills.
-
MUST rerun the non-self-review $fix passes once after the self-review loop reaches no_new_actionable_changes or blocked .
-
MUST NOT use scope_guardrail as the reason to reject a self-review suggestion once the self-review phase has started.
-
MUST NOT report a self-review answer that was already applied before the final self-review round; record finding=none only when the current final validated changeset yields no concrete compatible/provable self-review change.
-
MUST NOT emit If you could change one thing about this changeset what would you change? as a user-facing terminal line during normal successful completion.
-
MUST stop $fix once no new actionable self-review change remains and the post-self-review rerun is clean; do not continue under $fix into broader architecture, product, roadmap, or conceptual analysis.
-
MUST, if the user asks for broader or bolder analysis after a clean or closed $fix pass, close the $fix deliverable first and recommend the next skill explicitly ($grill-me , $parse , $plan , or $creative-problem-solver ) instead of continuing under $fix .
-
When paired with $tk in wave execution, MUST treat $fix as the final mutating pass before artifactization:
-
commit_first : hand off immediately to $commit after passing validation.
-
patch_first : hand off immediately to $patch after passing validation.
Default policy (non-interactive)
Use these defaults to maximize autonomy (avoid asking).
Priority order (highest first):
-
correctness + data safety + security
-
compatibility for PROVEN_USED behavior
-
performance
Definitions:
-
PROVEN_USED = behavior with evidence (see checklist below).
-
NON_PROVEN_USED = everything else.
PROVEN_USED evidence checklist (deterministic)
Goal: classify behavior as PROVEN_USED vs NON_PROVEN_USED without asking.
Algorithm:
-
Enumerate affected behavior tokens:
-
API symbols (functions/types/methods).
-
CLI flags/commands.
-
config keys / env vars.
-
file/wire formats (field names, JSON keys).
-
error codes/messages consumed by callers.
-
Collect evidence in this order (stop at first match):
-
User repro/signal references the token.
-
Tests assert on the token/behavior.
-
Docs/README/examples/CHANGELOG/config examples describe or demonstrate the token/behavior.
-
Repo callsites use the token (non-test, non-doc).
-
IF any evidence exists, THEN mark PROVEN_USED; ELSE mark NON_PROVEN_USED.
Evidence scan procedure (repo-local):
-
Tests: search tests/ , test/ , tests/ , spec/ , and files matching test / .spec. .
-
Docs/examples: search README* , docs/ , examples/ , CHANGELOG* , config.example* , and *.md .
-
Callsites: search remaining source files.
-
CI/config surfaces: also scan .github/workflows/ for env vars, flags, and script entrypoints.
Tooling hint: prefer rg -n "<token>" with --glob filters. Deterministic scan command template (run in this order):
-
rg -n "<token>" tests test tests spec --glob 'test' --glob '.spec.'
-
rg -n "<token>" README* docs examples CHANGELOG* config.example* --glob '*.md'
-
rg -n "<token>" .github/workflows
-
rg -n "<token>" .
Externally-used surface checklist (deterministic)
Treat a surface as external if ANY is true:
-
Token appears in docs/README/examples.
-
Token is a CLI flag/command, config key, env var, or file format field.
-
Token is exported/public API (examples: Rust pub , TS/JS export , Python in all , Go exported identifier).
If external use is plausible but uncertain:
-
Prefer an additive compatibility path (wrapper/alias/adapter) if small.
-
Ask only if a breaking change is unavoidable.
Compatibility rules:
-
MUST preserve PROVEN_USED behavior unless it is unsafe (crash/corruption/security). If unsafe, tighten and return a clear error.
-
For NON_PROVEN_USED behavior, treat it as undefined; tightening is allowed.
API/migration rules:
-
IF a fix touches an externally-used surface (use Externally-used surface checklist ), THEN prefer additive/backward-compatible changes (new option/new function/adapter) over breaking changes.
-
IF a breaking change is unavoidable, THEN stop and ask.
Performance rules:
-
MUST avoid obvious asymptotic regressions in hot paths.
-
IF performance impact is plausible but unmeasurable locally, THEN choose the smallest correctness-first fix and record the risk in Residual risks / open questions (do not ask).
Autonomy gate (conviction)
Proceed without asking only if ALL are true:
-
SIGNAL: you have (or can create) a local repro/signal without product ambiguity.
-
CONTRACT: you can derive contract from repo evidence OR characterize current behavior without product ambiguity.
-
INVARIANT: you can state invariant_before and invariant_after.
-
DIFF: the change is localized and reviewable for the chosen strategy.
-
PROOF: at least one validation signal passes after changes.
If ANY gate fails:
-
Apply Defaults + contract derivation.
-
Ask only if still blocked.
Correctness tightening (default)
Definition: tightening = rejecting invalid/ambiguous states earlier, removing silent failure paths, or making undefined behavior explicit.
Rule:
-
IF tightening prevents crash/corruption/security issue, THEN apply it (even if PROVEN_USED), return a clear error, and record the compatibility impact.
-
ELSE IF tightening affects only NON_PROVEN_USED inputs/states, THEN apply it without asking.
-
ELSE (tightening might change PROVEN_USED behavior):
-
prefer a backward-compatible shape (adapter/additive API), OR
-
lock current behavior with a characterization test,
-
ask only if compatibility cannot be preserved.
Allowed tightening moves (examples only):
-
Parse/coerce: implicit coercion/partial parse -> explicit parse + error (JS implicit string->number; Rust parse().ok() fallback).
-
Fallbacks: warn+fallback/default-without-signal -> explicit error OR explicit "unknown"/"invalid".
-
Lossy conversion: truncation/clamp/wrap -> checked conversion + error (Rust as ; C casts; Go int conversions).
-
Partial updates: multi-step mutation -> atomic/transactional boundary OR explicit partial result.
-
Ignored errors: ignore -> propagate with context; if intentionally ignored -> compensating signal (assert/test/metric/log).
-
Sentinels: -1 /0 /"" /None -> richer returns.
Defaults (use only when semantics-preserving for valid inputs)
Ownership / lifetimes
-
MUST inventory in-slice resources: allocations, handles, sockets, locks, txns, temp files.
-
For each resource, record: acquire_site, owner, release_action.
-
MUST ensure every exit releases exactly once.
-
MUST free/release ONLY via the proven owner/allocator/handle.
-
Never assume a "free" is safe without proving allocator/ownership.
-
Arena/bump allocators: per-object free is safe only if explicitly documented as permitted/no-op.
Validation signal selection
Algorithm:
-
IF the user provided a command, use it.
-
ELSE choose the cheapest local signal (no network) in this order:
-
README/QUICKSTART
-
scripts/check / scripts/test
-
Makefile / justfile / Taskfile.yml
-
ELSE create a proof hook in this order:
-
focused regression/characterization test
-
boundary assertion that fails loudly
-
scoped + rate-limited diagnostic log tied to ONE invariant (never log secrets/PII)
No-signal fail-fast (deterministic):
-
If no signal/proof hook is available after executing the selection algorithm once, stop before editing.
-
Return one blocker with blocked_by=no_repro_or_proof and include the exact commands you attempted.
PR/diff scope guardrail (default in review mode)
-
Default slice = changed lines/paths (git diff/PR diff) + at most one boundary seam (parse/construct/API edge) required to make the fix sound.
-
Do not fix pre-existing issues outside the slice unless:
-
severity is security/crash/corruption AND
-
the fix is localized and provable without widening the slice.
-
Otherwise: record as Residual risks / open questions .
Scope widening trigger (deterministic):
-
Widen beyond the diff only when ALL are true:
-
severity is security , crash , or corruption
-
you can show a concrete causal chain from a diff token to the out-of-slice location
-
the widening is limited to one adjacent seam (at most one additional file/module boundary)
-
one local validation signal can prove the widened fix
-
If any check fails, keep scope fixed and record blocked_by=scope_guardrail .
-
This guardrail applies to the core review passes.
-
Once the final agent-directed self-review loop starts, scope expands to the current repo/worktree and scope_guardrail is not a valid reason to reject the self-review suggestion.
Generated / third-party code guardrail
-
If a file appears generated or is under third-party/build output, do not edit it directly.
-
Prefer to locate the source-of-truth + regeneration step; apply fixes there and regenerate as needed.
-
Common no-edit zones (examples): dist/ , build/ , vendor/ , node_modules/ .
Proof discipline for passing baselines
When the baseline signal is ok:
-
For every acted-on finding, prefer a proof hook that fails before the fix (focused regression/characterization test).
-
If you cannot produce a failing proof hook, do not edit; attempt to create one first by:
-
turning the counterexample into a focused regression/characterization test
-
enforcing the invariant at a single boundary seam (parse/refine once) and testing the new error
-
reducing effects to a pure helper and unit-testing it
-
using an existing fuzzer/property test harness if present
-
Only if you still cannot create a proof hook without product ambiguity, record the blocker as residual risk.
-
Self-review exception: when the self-review finding is a concrete compatible simplification or auditability improvement on an already-green changeset, a fresh failing hook is preferred but not mandatory; the primary validation bundle may serve as the proof hook if it still proves the improved invariant after the change.
Residual risks / open questions policy (last resort)
Residual risks are a record of what you could not safely fix (not a to-do list).
Rules:
-
Before emitting a residual item, attempt to convert it into an actionable finding:
-
produce a concrete counterexample
-
attach a proof hook (test/assert) that fails before the fix
-
apply the smallest localized fix within guardrails
-
re-run the validation signal
-
Only emit residual items that are truly blocked by one of these blockers:
-
product_ambiguity (semantics cannot be derived/characterized safely)
-
breaking_change (no additive path; fix would be breaking)
-
no_repro_or_proof (cannot create a repro/proof hook locally)
-
scope_guardrail (outside diff slice and not severe enough to widen)
-
generated_output (generated/third-party output; need source-of-truth + regen)
-
external_dependency (needs network/creds/services/hardware)
-
perf_unmeasurable (impact plausible; no local measurement)
-
For self-review-originated changes, do not emit blocked_by=scope_guardrail ; if scope is the only blocker, widen and continue.
-
Every residual bullet MUST include: a location or token, blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> , and next=<one action> .
-
If there are no residual items, output - None .
Multi-pass loop (default)
Goal: reduce missed issues in PR/diff reviews without widening scope.
Run 3 core passes (mandatory). Run 2 additional delta passes only if pass 3 edits code.
Pass 1) Safety (highest severity)
-
Scope: diff-driven slice + required boundary seams.
-
Focus: security/crash/corruption hazards, unsafe tightening, missing error propagation.
-
Change budget: smallest sound fix only.
Pass 2) Surface (compat + misuse)
-
Scope: externally-used surfaces touched by the diff (exports/CLI/config/format/docs).
-
Focus: additive compatibility, defuse top footguns, clearer errors.
-
Change budget: additive/wrapper/adapter preferred; breaking change => stop and ask.
Pass 3) Audit (invariants + ownership + proof quality)
-
Scope: final diff slice.
-
Focus: invariants enforced at strongest cheap boundary; ownership release-on-all-paths; proof strength.
-
Delegation: run $invariant-ace first for invariant framing, then $complexity-mitigator for complexity verdicts that affect auditability.
-
Change budget: no refactors unless they directly reduce risk/auditability of invariants.
Early exit (stop after pass 3):
- If pass 3 applies no edits, stop (after running/confirming the primary validation signal).
Delta passes (only if pass 3 applied edits)
Pass 4) Safety delta rescan
-
Scope: ONLY lines/paths changed by passes 1-3 and their immediate boundaries.
-
Focus: new security/crash/corruption hazards introduced by the fixes.
Pass 5) Surface + proof delta rescan
-
Re-enumerate behavior tokens from the FINAL diff (including newly introduced errors/flags/exports/config keys).
-
Re-run PROVEN_USED/external-surface checks for newly introduced tokens.
-
Ensure proof is still strong for the final diff.
Rules:
-
After any pass that edits code, run a local signal before continuing.
-
Do not edit on suspicion: every edit must have a concrete counterexample, invariant_before/after, and a proof hook.
-
During the self-review phase, the counterexample may be an auditability or responsibility-split failure in the current validated changeset; if the change is compatible and locally validated, that is actionable under $fix even when the baseline was already green.
-
Merge/de-duplicate findings across passes; final output format stays unchanged.
-
Emit pass progress updates in real time at pass start/end; these updates are required and do not replace the final Pass trace section.
Clarify before changes
Stop and ask ONLY if any is true:
-
Behavior is contradictory/product-sensitive AND cannot be derived from tests/docs/callsites AND cannot be characterized safely.
-
A breaking API change or irreversible migration is unavoidable (no backward-compatible path).
-
No local signal/proof hook can be found or created without product ambiguity.
Finding record schema (internal)
For every issue you act on, construct this record before editing:
-
id: F<number>
-
location: file:line
-
severity: security|crash|corruption|logic
-
issue:
-
tokens: <affected behavior tokens; empty if none>
-
proven_used: <yes/no + evidence if yes>
-
external_surface: <yes/no + why>
-
diff_touch: <yes/no>
-
counterexample: <input/timeline>
-
invariant_before: <what is allowed/assumed today>
-
invariant_after: <what becomes guaranteed/rejected>
-
fix:
-
proof: <test/assert/log + validation command + result>
-
proof_strength: characterization|targeted_regression|property_or_fuzz
-
compatibility_impact: none|tightening|additive|breaking
Canonical finding example (fully-filled)
Use this as the reference shape when writing findings.
F1 src/config_loader.py:88 — crash — untrusted config type causes uncaught attribute access
- Surface: Tokens=config.path; PROVEN_USED=yes (tests/config/test_loader.py::test_reads_path); External=yes; Diff_touch=yes
- Counterexample:
{"path":null}reachescfg.path.strip()and raisesAttributeError - Invariant (before): loader assumes
pathis a non-empty string - Invariant (after): loader accepts only non-empty string
path; invalid type returns explicit error - Fix: add boundary validation in
parse_config()and reject invalidpathbefore use - Proof:
uv run pytest tests/config/test_loader.py::test_rejects_null_path-> ok - Proof strength:
targeted_regression - Compatibility impact:
tightening
Residual risks / open questions
vendor/generated/config_schema.py— blocked_by=generated_output — next=edit source schema and regenerate artifact
Workflow (algorithm)
- Preflight
-
Determine mode: review-only vs fix.
-
If the requested target is this skill (or another skill), route through $refine :
-
follow $refine Discover -> Define -> Develop -> Deliver,
-
apply minimal skill diffs,
-
run uv run --with pyyaml -- python3 codex/skills/.system/skill-creator/scripts/quick_validate.py codex/skills/<skill-name> .
-
if codex/skills/fix/SKILL.md was touched, also run uv run python codex/skills/fix/scripts/lint_fix_skill_contract.py codex/skills/fix/SKILL.md .
-
Return to the rest of fix only if the request also includes code/runtime defects.
-
Define slice:
-
If in a git repo and there is a diff, derive slice/tokens from the diff (paths, changed symbols/strings).
-
Otherwise: entrypoint, inputs, outputs, state.
-
If the request is PR-scoped (for example "$fix this PR ", "$fix current branch ", CI failure on a PR), anchor the slice to PR diff/base and keep the work PR-local unless severity widening is required.
-
Apply PR/diff scope guardrail for review mode.
-
Apply Generated / third-party code guardrail before editing.
-
Select validation signal (or create proof hook).
-
If signal/proof creation fails, stop before editing and return blocked_by=no_repro_or_proof with attempted commands.
- Contract + baseline
-
Determine PROVEN_USED behavior:
-
Apply PROVEN_USED evidence checklist to the behavior tokens affected by the slice.
-
Derive contract without asking (in this order):
-
tests that exercise the slice
-
callsites in the repo
-
docs/README/examples
-
if none: add a characterization test for current behavior
-
Write contract (1 sentence): "Working means …".
-
Run baseline signal once; record result.
-
If baseline signal is ok, apply Proof discipline for passing baselines .
- Create initial findings
-
Enumerate candidate failure modes for the slice.
-
Rank security > crash > corruption > logic.
-
For each issue you will act on, create a finding record.
- Mandatory scans (multi-pass)
Run the Multi-pass loop (default) for the touched slice.
3a) Unsoundness scan
For each hazard class that applies:
-
Identify the first failure point.
-
Provide a concrete counterexample (input/timeline).
-
Specify the smallest sound fix that removes the bug-class.
Hazard classes:
-
Security boundaries: authz/authn, injection, path traversal, SSRF, unsafe deserialization, secrets/PII handling.
-
Nullability/uninitialized state; sentinel values.
-
Ownership/lifetime: leaks, double-free, use-after-close, lock not released.
-
Concurrency/order/time: races, lock ordering, reentrancy, timeouts/retries, TOCTOU.
-
Bounds/arithmetic: indexing/slicing, overflow/underflow, signedness.
-
Persistence/atomicity: partial writes, torn updates, non-transactional multi-step updates.
-
Encoding/units: bytes vs chars, timezone/locale, unit mismatches, lossy normalization.
-
Error handling: swallowed errors, missing context, silent fallback.
Language cues (examples only):
-
Rust: unsafe , unwrap/expect , as casts.
-
JS/TS: any , implicit coercions, undefined flows.
-
Python: bare except , truthiness-based branching.
-
C/C++: raw pointers, unchecked casts, manual malloc/free .
3b) Invariant strengthening scan
Delegate to $invariant-ace and import its artifacts into fix .
Default execution:
-
Run $invariant-ace Compact Mode first.
-
Capture at minimum:
-
Counterexample
-
Invariants
-
Owner and Scope
-
Enforcement Boundary
-
Seam (Before -> After)
-
Verification
-
Map these to fix finding fields:
-
invariant_before from broken trace + prior scope,
-
invariant_after from chosen predicate(s) + holds scope,
-
fix from seam,
-
proof from verification signal.
-
Escalate to full $invariant-ace protocol if Compact Mode does not yield an inductive predicate.
3c) Footgun scan + defusal
Trigger: you touched an API surface OR a caller can plausibly misuse the code.
For top-ranked misuse paths:
-
Provide minimal misuse snippet + surprising behavior.
-
Defuse by changing the surface:
-
options structs / named params
-
split booleans into enums or separate functions
-
explicit units/encodings
-
richer returns (no sentinels)
-
separate pure computation from effects
-
Lock with a regression test or boundary assertion.
3d) Complexity scan (risk-driven)
Delegate to $complexity-mitigator for analysis; implement only via fix .
Rule: reshape only when it reduces risk and improves auditability of invariants/ownership.
Algorithm:
-
Run $complexity-mitigator on the touched slice (heat read + essential/incidental verdict + ranked options + TRACE).
-
Select the smallest viable cut that directly supports a finding (safety/surface/invariant ownership/proof quality).
-
If simplification depends on an unstated invariant, run/refresh $invariant-ace first.
-
Keep complexity-only cleanup out of scope unless it closes a concrete risk.
- Implement fixes (per finding)
For findings in severity order:
-
Implement the smallest sound fix that removes the bug-class.
-
Apply correctness tightening when allowed.
-
Ensure invariants + ownership hold on all paths.
-
Keep diff reviewable; avoid drive-by refactors.
- Close the loop (required)
-
Run the chosen validation signal.
-
IF it fails:
-
update findings with the new counterexample,
-
apply the smallest additional fix,
-
re-run the SAME signal.
-
Repeat until the signal passes.
-
If you cannot make progress after 3 repair cycles, stop and ask with:
-
the last failing output (key lines/stack),
-
what you tried,
-
the smallest remaining decision that blocks you.
- Residual risk sweep (required)
-
Re-check Residual risks / open questions policy .
-
Any item without a valid blocker becomes a finding and is fixed (with proof) or dropped.
- Agent-directed self-review loop (required final step when a changeset exists)
-
Precondition gate: run this step only when Changes applied is not None and the latest validation signal result is ok .
-
Freeze the self-review baseline as the latest validated changeset.
-
Ask internally exactly: If you could change one thing about this changeset what would you change?
-
Summarize the self-round in Self-review loop trace with one delta row.
-
If the answer yields one new actionable self-review change:
-
convert it into one concrete finding,
-
treat compatible ergonomic/structural/API-shaping improvements as actionable when they materially improve the current validated changeset and can be proven locally,
-
widen anywhere in the current repo/worktree as needed,
-
apply the smallest sound change that materially addresses the critique,
-
re-run the chosen validation signal and update findings/proof,
-
record the (validated_changeset_fingerprint, normalized_answer_summary) pair so repeated suggestions cannot loop forever,
-
repeat from step 2.
-
If the answer yields only blocked changes, record stop_reason=blocked , carry blockers to Residual risks / open questions , and continue to step 8.
-
If the answer yields no new actionable self-review change, record stop_reason=no_new_actionable_changes and continue to step 8.
-
This check is against the current final validated changeset, not an earlier pre-delta review state.
-
already green , architectural , not fix-shaped , or no failing proof hook are not sufficient reasons by themselves when a concrete compatible/provable improvement still exists.
-
Run the non-self-review $fix passes once against the full resulting changeset.
-
If that rerun edits code, discard the stale self-review state, revalidate, and restart from step 2 against the new final changeset.
-
If that rerun applies no edits, exit the loop and proceed to output.
-
Skip gate: if Changes applied is None or the run is blocked before edits, output - None (skip_gate) in Self-review loop trace and proceed to output.
7a) Post-fix boundary (required)
-
If a clean or closed $fix pass surfaces broader non-fix opportunities, do not continue exploring them under $fix .
-
If the user explicitly asks for broader or deeper follow-up after closure, recommend the next skill explicitly:
-
$parse for architecture or purpose analysis
-
$grill-me for interrogation, pressure-testing, or narrowing the next move
-
$plan for a decision-complete roadmap
-
$creative-problem-solver for wider option generation
-
Do not place those broader opportunities in Residual risks / open questions unless a valid blocker from the allowed set applies.
- Output lock (required)
Before sending the final message, verify all are true:
-
Heading set is exact and complete (Contract , Findings (severity order) , Changes applied , Pass trace , Validation , Self-review loop trace , Residual risks / open questions ).
-
Pass trace includes planned/executed counts and P1/P2/P3 lines (plus P4/P5 when executed).
-
Runtime pass updates (Pass <n>/<total_planned>: ... ) were emitted during execution.
-
If embedded mode was used, include Fix Record in the same assistant message (after any required artifact).
-
Validation includes machine-checkable keys: baseline_cmd , baseline_result , proof_hook , final_cmd , final_result .
-
Every acted-on finding includes Proof strength and Compatibility impact using the allowed enums.
-
Every residual blocked_by uses only: product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable .
-
If Changes applied is not None AND the latest validation result is ok , Self-review loop trace includes a terminal row with stop_reason=<no_new_actionable_changes|blocked> and the user-facing final line is not the question.
Deliverable format (chat)
Output exactly these sections in this order.
If no findings:
-
Findings (severity order): None .
-
Changes applied: None .
-
Self-review loop trace: - None (skip_gate) .
-
Residual risks / open questions: - None .
-
Still include Pass trace and Validation.
Contract
Findings (severity order) For each finding:
-
F# <file:line> — <security|crash|corruption|logic> —
-
Surface: Tokens=<...>; PROVEN_USED=<yes/no + evidence>; External=<yes/no>; Diff_touch=<yes/no>
-
Counterexample: <input/timeline>
-
Invariant (before): <what was assumed/allowed>
-
Invariant (after): <what is now guaranteed/rejected>
-
Fix:
-
Proof: <test/assert/log + validation command> -> <ok/fail>
-
Proof strength: <characterization|targeted_regression|property_or_fuzz>
-
Compatibility impact: <none|tightening|additive|breaking>
Changes applied
- —
Pass trace
-
Core passes planned: 3 ; core passes executed: <3>
-
Delta passes planned: <0|2> ; delta passes executed: <0|2>
-
Total passes executed: <3|5>
-
P1 Safety -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
-
P2 Surface -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
-
P3 Audit -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
-
If delta passes executed, also include P4 and P5 lines in the same format.
Validation
-
-> <ok/fail>
-
{"baseline_cmd":"<cmd|n/a>","baseline_result":"<ok|fail|n/a>","proof_hook":"<test/assert/log|n/a>","final_cmd":"<cmd>","final_result":"<ok|fail>"} (single-line JSON)
Self-review loop trace
-
If none: - None (skip_gate)
-
Otherwise: S# prompt=If you could change one thing about this changeset what would you change? ; answer_summary=<...>; finding=<F#|none> ; change_applied=<yes|no> ; proof=<cmd|n/a> ; result=<ok|fail|n/a> ; stop_reason=<continue|no_new_actionable_changes|blocked>
Residual risks / open questions
-
If none: - None
-
Otherwise: - <file:line or token> — blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> — next=<one action>
Fix Record (embedded mode only)
Use only when $fix is invoked inside another skill.
Findings (severity order) For each finding:
-
F# <file:line> — <security|crash|corruption|logic> —
-
Surface: Tokens=<...>; PROVEN_USED=<yes/no + evidence>; External=<yes/no>; Diff_touch=<yes/no>
-
Counterexample: <input/timeline>
-
Invariant (before): <what was assumed/allowed>
-
Invariant (after): <what is now guaranteed/rejected>
-
Fix:
-
Proof: <test/assert/log + validation command> -> <ok/fail>
-
Proof strength: <characterization|targeted_regression|property_or_fuzz>
-
Compatibility impact: <none|tightening|additive|breaking>
Changes applied
- —
Pass trace
-
Core passes planned: 3 ; core passes executed: <3>
-
Delta passes planned: <0|2> ; delta passes executed: <0|2>
-
Total passes executed: <3|5>
-
P1 Safety -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
-
P2 Surface -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
-
P3 Audit -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
-
If delta passes executed, also include P4 and P5 lines in the same format.
Validation
-
-> <ok/fail>
-
{"baseline_cmd":"<cmd|n/a>","baseline_result":"<ok|fail|n/a>","proof_hook":"<test/assert/log|n/a>","final_cmd":"<cmd>","final_result":"<ok|fail>"} (single-line JSON)
Self-review loop trace
-
If none: - None (skip_gate)
-
Otherwise: S# prompt=If you could change one thing about this changeset what would you change? ; answer_summary=<...>; finding=<F#|none> ; change_applied=<yes|no> ; proof=<cmd|n/a> ; result=<ok|fail|n/a> ; stop_reason=<continue|no_new_actionable_changes|blocked>
Residual risks / open questions
-
If none: - None
-
Otherwise: - <file:line or token> — blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> — next=<one action>
Pitfalls
-
Vague advice without locations.
-
"Nice-to-have" refactors that don’t reduce risk.
-
Feature creep.
-
Continuing broad repo critique or planning under a completed $fix label.
-
Asking for risk tolerance instead of applying the default policy.
-
Tightening presented as optional.
-
Ownership fixes without proven allocator/owner.
-
Editing generated/third-party outputs instead of source-of-truth.
-
Code edits without a failing proof hook when baseline was ok.
-
Using Residual risks / open questions as a substitute for fixable findings.
Activation cues
-
"resolve" / "fix" / "crash" / "data corruption"
-
"$fix this PR" / "$fix current branch" / "fix this branch"
-
"CI failed" / "fix the red checks" / "repair failing PR"
-
"footgun" / "misuse" / "should never happen"
-
"invariant" / "lifetime" / "nullable surprise"
-
"too complex" / "branch soup" / "cross-file hops"
-
"refine $fix" / "improve fix skill workflow" / "update fix skill docs"