fix

Fix

Intent

Make risky or unclear code safe with the smallest sound, validated change, then keep listening to the final self-review until no actionable self-review change remains. When the validated changeset survives the post-self-review rerun, $fix stops at that boundary; broader architecture, product, or roadmap analysis belongs to another skill.

Double Diamond fit

fix spans the full Double Diamond, but keeps divergence mostly internal to stay autonomous:

Discover: reproduce/characterize + collect counterexamples.
Define: state the contract and invariants (before/after).
Develop: (internal) consider 2-3 plausible fix strategies; pick the smallest sound one per default policy.
Deliver: implement + prove with a real validation signal.

If a fix requires a product-sensitive choice (cannot be derived or characterized safely), stop and ask (or invoke $creative-problem-solver when multiple viable strategies exist).

Skill composition (required)

$invariant-ace : default invariant/protocol engine for fix ; run it before invariant-affecting edits.
$complexity-mitigator : default complexity analysis engine; use it for risk-driven complexity judgments before reshaping code.
$refine : default path when the task is to improve fix itself (or other skills); apply $refine workflow and quick_validate .
Delegation order when both are needed: $invariant-ace first, then $complexity-mitigator .

Delegation triggers (deterministic):

Run $invariant-ace when the slice touches stateful boundaries (parse/construct/API/DB/lock/txn/retry ordering), ownership/lifetime, or any "should never happen" claim.
Run $complexity-mitigator when auditability is at risk (for example: branch soup, deep nesting, cross-file reasoning, or unclear ownership/control flow).
Run $refine when the requested target is a skill artifact (for example SKILL.md , agents/openai.yaml , skill scripts/references/assets).
If both invariant and complexity triggers fire, run $invariant-ace first, then $complexity-mitigator .

Inputs

User request text.
Repo state (code + tests + scripts).
Validation signal (failing test/repro/log) OR a proof hook you create.

Outputs (chat)

Emit the exact sections in Deliverable format (chat) .
Use section headings verbatim (for example, Findings (severity order) , not Findings ).
During execution, emit one-line pass progress updates at pass start and pass end using: Pass <n>/<total_planned>: <name> — <start|done>; edits=<yes|no|n/a>; signal=<cmd|n/a>; result=<ok|fail|n/a> .
In Validation , include a machine-checkable JSON object with keys: baseline_cmd , baseline_result , proof_hook , final_cmd , final_result .
Keep self-review transcript shapes aligned with references/self_review_loop_examples.md when editing this contract.

Embedded mode (when $fix is invoked inside another skill)

You may emit a compact Fix Record instead of the full deliverable.
If another skill requires a primary artifact (for example patch-only diff), append Fix Record immediately after that artifact in the same assistant message.
Do not mention “Using $fix” without emitting either the full deliverable or a Fix Record.

Hard rules (MUST / MUST NOT)

MUST default to review + implement (unless review-only is requested).
MUST triage and present findings in severity order: security > crash > corruption > logic.
MUST NOT claim done without a passing validation signal.
MUST NOT edit when no local signal/proof hook can be found or created under Validation signal selection ; fail fast and report blocker.
MUST NOT do product/feature work.
MUST NOT do intentional semantic changes without clarifying, except correctness tightening.
MUST resolve trade-offs using Default policy (non-interactive) .
MUST emit pass progress updates while running the multi-pass loop.
MUST include a final Pass trace section in the deliverable/Fix Record with executed pass count and per-pass outcomes.
MUST include machine-checkable validation evidence keys in Validation : baseline_cmd , baseline_result , proof_hook , final_cmd , final_result .
MUST use the exact heading names from Deliverable format (chat) / Fix Record ; do not alias or shorten heading labels.
MUST produce a complete finding record for every acted-on issue:
counterexample
invariant_before
invariant_after
fix
proof
proof_strength
compatibility_impact
MUST follow the delegation contract in Skill composition (required) (including order: $invariant-ace -> $complexity-mitigator ).
MUST route skill-self edits (for example codex/skills/fix ) through $refine and run quick_validate .
MUST NOT put fixable items in Residual risks / open questions ; if it is fixable under the autonomy gate + guardrails, treat it as a finding and fix it.
MUST include a final Self-review loop trace section in the deliverable/Fix Record.
MUST run the final agent-directed self-review loop only after at least one change is applied and validation is passing.
MUST run the self-review loop against the final validated changeset only.
MUST invalidate and rerun the self-review loop if any non-self-review edit occurs after a self-review round.
MUST ask internally exactly: If you could change one thing about this changeset what would you change?
MUST treat the final agent-directed self-review phase as current-worktree scoped once the first validated changeset exists; do not reject a self-review suggestion solely because it broadens the diff.
MUST treat a self-review answer as actionable whenever it identifies a concrete compatible/provable improvement on the current validated changeset, even if the improvement is ergonomic, structural, or API-shaping rather than a baseline bug fix.
MUST answer that question internally, apply at most one new actionable self-review change per self-round, re-run validation, and repeat until a self-round yields no new actionable self-review change or only blocked changes.
MUST NOT reject a self-review suggestion solely because the baseline is already green, the concern sounds architectural, or the change would reshape a public/API seam; if it can stay backward-compatible and be revalidated locally, implement it.
MUST, when a self-review critique is broader than the smallest bug repair, apply the narrowest compatible/provable slice that materially addresses the critique before considering broader follow-up skills.
MUST rerun the non-self-review $fix passes once after the self-review loop reaches no_new_actionable_changes or blocked .
MUST NOT use scope_guardrail as the reason to reject a self-review suggestion once the self-review phase has started.
MUST NOT report a self-review answer that was already applied before the final self-review round; record finding=none only when the current final validated changeset yields no concrete compatible/provable self-review change.
MUST NOT emit If you could change one thing about this changeset what would you change? as a user-facing terminal line during normal successful completion.
MUST stop $fix once no new actionable self-review change remains and the post-self-review rerun is clean; do not continue under $fix into broader architecture, product, roadmap, or conceptual analysis.
MUST, if the user asks for broader or bolder analysis after a clean or closed $fix pass, close the $fix deliverable first and recommend the next skill explicitly ($grill-me , $parse , $plan , or $creative-problem-solver ) instead of continuing under $fix .
When paired with $tk in wave execution, MUST treat $fix as the final mutating pass before artifactization:
commit_first : hand off immediately to $commit after passing validation.
patch_first : hand off immediately to $patch after passing validation.

Default policy (non-interactive)

Use these defaults to maximize autonomy (avoid asking).

Priority order (highest first):

correctness + data safety + security
compatibility for PROVEN_USED behavior
performance

Definitions:

PROVEN_USED = behavior with evidence (see checklist below).
NON_PROVEN_USED = everything else.

PROVEN_USED evidence checklist (deterministic)

Goal: classify behavior as PROVEN_USED vs NON_PROVEN_USED without asking.

Algorithm:

Enumerate affected behavior tokens:
API symbols (functions/types/methods).
CLI flags/commands.
config keys / env vars.
file/wire formats (field names, JSON keys).
error codes/messages consumed by callers.
Collect evidence in this order (stop at first match):
User repro/signal references the token.
Tests assert on the token/behavior.
Docs/README/examples/CHANGELOG/config examples describe or demonstrate the token/behavior.
Repo callsites use the token (non-test, non-doc).
IF any evidence exists, THEN mark PROVEN_USED; ELSE mark NON_PROVEN_USED.

Evidence scan procedure (repo-local):

Tests: search tests/ , test/ , tests/ , spec/ , and files matching test / .spec. .
Docs/examples: search README* , docs/ , examples/ , CHANGELOG* , config.example* , and *.md .
Callsites: search remaining source files.
CI/config surfaces: also scan .github/workflows/ for env vars, flags, and script entrypoints.

Tooling hint: prefer rg -n "<token>" with --glob filters. Deterministic scan command template (run in this order):

rg -n "<token>" tests test tests spec --glob 'test' --glob '.spec.'
rg -n "<token>" README* docs examples CHANGELOG* config.example* --glob '*.md'
rg -n "<token>" .github/workflows
rg -n "<token>" .

Externally-used surface checklist (deterministic)

Treat a surface as external if ANY is true:

Token appears in docs/README/examples.
Token is a CLI flag/command, config key, env var, or file format field.
Token is exported/public API (examples: Rust pub , TS/JS export , Python in all , Go exported identifier).

If external use is plausible but uncertain:

Prefer an additive compatibility path (wrapper/alias/adapter) if small.
Ask only if a breaking change is unavoidable.

Compatibility rules:

MUST preserve PROVEN_USED behavior unless it is unsafe (crash/corruption/security). If unsafe, tighten and return a clear error.
For NON_PROVEN_USED behavior, treat it as undefined; tightening is allowed.

API/migration rules:

IF a fix touches an externally-used surface (use Externally-used surface checklist ), THEN prefer additive/backward-compatible changes (new option/new function/adapter) over breaking changes.
IF a breaking change is unavoidable, THEN stop and ask.

Performance rules:

MUST avoid obvious asymptotic regressions in hot paths.
IF performance impact is plausible but unmeasurable locally, THEN choose the smallest correctness-first fix and record the risk in Residual risks / open questions (do not ask).

Autonomy gate (conviction)

Proceed without asking only if ALL are true:

SIGNAL: you have (or can create) a local repro/signal without product ambiguity.
CONTRACT: you can derive contract from repo evidence OR characterize current behavior without product ambiguity.
INVARIANT: you can state invariant_before and invariant_after.
DIFF: the change is localized and reviewable for the chosen strategy.
PROOF: at least one validation signal passes after changes.

If ANY gate fails:

Apply Defaults + contract derivation.
Ask only if still blocked.

Correctness tightening (default)

Definition: tightening = rejecting invalid/ambiguous states earlier, removing silent failure paths, or making undefined behavior explicit.

Rule:

IF tightening prevents crash/corruption/security issue, THEN apply it (even if PROVEN_USED), return a clear error, and record the compatibility impact.
ELSE IF tightening affects only NON_PROVEN_USED inputs/states, THEN apply it without asking.
ELSE (tightening might change PROVEN_USED behavior):
prefer a backward-compatible shape (adapter/additive API), OR
lock current behavior with a characterization test,
ask only if compatibility cannot be preserved.

Allowed tightening moves (examples only):

Parse/coerce: implicit coercion/partial parse -> explicit parse + error (JS implicit string->number; Rust parse().ok() fallback).
Fallbacks: warn+fallback/default-without-signal -> explicit error OR explicit "unknown"/"invalid".
Lossy conversion: truncation/clamp/wrap -> checked conversion + error (Rust as ; C casts; Go int conversions).
Partial updates: multi-step mutation -> atomic/transactional boundary OR explicit partial result.
Ignored errors: ignore -> propagate with context; if intentionally ignored -> compensating signal (assert/test/metric/log).
Sentinels: -1 /0 /"" /None -> richer returns.

Defaults (use only when semantics-preserving for valid inputs)

Ownership / lifetimes

MUST inventory in-slice resources: allocations, handles, sockets, locks, txns, temp files.
For each resource, record: acquire_site, owner, release_action.
MUST ensure every exit releases exactly once.
MUST free/release ONLY via the proven owner/allocator/handle.
Never assume a "free" is safe without proving allocator/ownership.
Arena/bump allocators: per-object free is safe only if explicitly documented as permitted/no-op.

Validation signal selection

Algorithm:

IF the user provided a command, use it.
ELSE choose the cheapest local signal (no network) in this order:
README/QUICKSTART
scripts/check / scripts/test
Makefile / justfile / Taskfile.yml
ELSE create a proof hook in this order:
focused regression/characterization test
boundary assertion that fails loudly
scoped + rate-limited diagnostic log tied to ONE invariant (never log secrets/PII)

No-signal fail-fast (deterministic):

If no signal/proof hook is available after executing the selection algorithm once, stop before editing.
Return one blocker with blocked_by=no_repro_or_proof and include the exact commands you attempted.

PR/diff scope guardrail (default in review mode)

Default slice = changed lines/paths (git diff/PR diff) + at most one boundary seam (parse/construct/API edge) required to make the fix sound.
Do not fix pre-existing issues outside the slice unless:
severity is security/crash/corruption AND
the fix is localized and provable without widening the slice.
Otherwise: record as Residual risks / open questions .

Scope widening trigger (deterministic):

Widen beyond the diff only when ALL are true:
severity is security , crash , or corruption
you can show a concrete causal chain from a diff token to the out-of-slice location
the widening is limited to one adjacent seam (at most one additional file/module boundary)
one local validation signal can prove the widened fix
If any check fails, keep scope fixed and record blocked_by=scope_guardrail .
This guardrail applies to the core review passes.
Once the final agent-directed self-review loop starts, scope expands to the current repo/worktree and scope_guardrail is not a valid reason to reject the self-review suggestion.

Generated / third-party code guardrail

If a file appears generated or is under third-party/build output, do not edit it directly.
Prefer to locate the source-of-truth + regeneration step; apply fixes there and regenerate as needed.
Common no-edit zones (examples): dist/ , build/ , vendor/ , node_modules/ .

Proof discipline for passing baselines

When the baseline signal is ok:

For every acted-on finding, prefer a proof hook that fails before the fix (focused regression/characterization test).
If you cannot produce a failing proof hook, do not edit; attempt to create one first by:
turning the counterexample into a focused regression/characterization test
enforcing the invariant at a single boundary seam (parse/refine once) and testing the new error
reducing effects to a pure helper and unit-testing it
using an existing fuzzer/property test harness if present
Only if you still cannot create a proof hook without product ambiguity, record the blocker as residual risk.
Self-review exception: when the self-review finding is a concrete compatible simplification or auditability improvement on an already-green changeset, a fresh failing hook is preferred but not mandatory; the primary validation bundle may serve as the proof hook if it still proves the improved invariant after the change.

Residual risks / open questions policy (last resort)

Residual risks are a record of what you could not safely fix (not a to-do list).

Rules:

Before emitting a residual item, attempt to convert it into an actionable finding:
produce a concrete counterexample
attach a proof hook (test/assert) that fails before the fix
apply the smallest localized fix within guardrails
re-run the validation signal
Only emit residual items that are truly blocked by one of these blockers:
product_ambiguity (semantics cannot be derived/characterized safely)
breaking_change (no additive path; fix would be breaking)
no_repro_or_proof (cannot create a repro/proof hook locally)
scope_guardrail (outside diff slice and not severe enough to widen)
generated_output (generated/third-party output; need source-of-truth + regen)
external_dependency (needs network/creds/services/hardware)
perf_unmeasurable (impact plausible; no local measurement)
For self-review-originated changes, do not emit blocked_by=scope_guardrail ; if scope is the only blocker, widen and continue.
Every residual bullet MUST include: a location or token, blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> , and next=<one action> .
If there are no residual items, output - None .

Multi-pass loop (default)

Goal: reduce missed issues in PR/diff reviews without widening scope.

Run 3 core passes (mandatory). Run 2 additional delta passes only if pass 3 edits code.

Pass 1) Safety (highest severity)

Scope: diff-driven slice + required boundary seams.
Focus: security/crash/corruption hazards, unsafe tightening, missing error propagation.
Change budget: smallest sound fix only.

Pass 2) Surface (compat + misuse)

Scope: externally-used surfaces touched by the diff (exports/CLI/config/format/docs).
Focus: additive compatibility, defuse top footguns, clearer errors.
Change budget: additive/wrapper/adapter preferred; breaking change => stop and ask.

Pass 3) Audit (invariants + ownership + proof quality)

Scope: final diff slice.
Focus: invariants enforced at strongest cheap boundary; ownership release-on-all-paths; proof strength.
Delegation: run $invariant-ace first for invariant framing, then $complexity-mitigator for complexity verdicts that affect auditability.
Change budget: no refactors unless they directly reduce risk/auditability of invariants.

Early exit (stop after pass 3):

If pass 3 applies no edits, stop (after running/confirming the primary validation signal).

Delta passes (only if pass 3 applied edits)

Pass 4) Safety delta rescan

Scope: ONLY lines/paths changed by passes 1-3 and their immediate boundaries.
Focus: new security/crash/corruption hazards introduced by the fixes.

Pass 5) Surface + proof delta rescan

Re-enumerate behavior tokens from the FINAL diff (including newly introduced errors/flags/exports/config keys).
Re-run PROVEN_USED/external-surface checks for newly introduced tokens.
Ensure proof is still strong for the final diff.

Rules:

After any pass that edits code, run a local signal before continuing.
Do not edit on suspicion: every edit must have a concrete counterexample, invariant_before/after, and a proof hook.
During the self-review phase, the counterexample may be an auditability or responsibility-split failure in the current validated changeset; if the change is compatible and locally validated, that is actionable under $fix even when the baseline was already green.
Merge/de-duplicate findings across passes; final output format stays unchanged.
Emit pass progress updates in real time at pass start/end; these updates are required and do not replace the final Pass trace section.

Clarify before changes

Stop and ask ONLY if any is true:

Behavior is contradictory/product-sensitive AND cannot be derived from tests/docs/callsites AND cannot be characterized safely.
A breaking API change or irreversible migration is unavoidable (no backward-compatible path).
No local signal/proof hook can be found or created without product ambiguity.

Finding record schema (internal)

For every issue you act on, construct this record before editing:

id: F<number>
location: file:line
severity: security|crash|corruption|logic
issue:
tokens: <affected behavior tokens; empty if none>
proven_used: <yes/no + evidence if yes>
external_surface: <yes/no + why>
diff_touch: <yes/no>
counterexample: <input/timeline>
invariant_before: <what is allowed/assumed today>
invariant_after: <what becomes guaranteed/rejected>
fix:
proof: <test/assert/log + validation command + result>
proof_strength: characterization|targeted_regression|property_or_fuzz
compatibility_impact: none|tightening|additive|breaking

Canonical finding example (fully-filled)

Use this as the reference shape when writing findings.

F1 src/config_loader.py:88 — crash — untrusted config type causes uncaught attribute access

Surface: Tokens=config.path; PROVEN_USED=yes (tests/config/test_loader.py::test_reads_path); External=yes; Diff_touch=yes
Counterexample: {"path":null} reaches cfg.path.strip() and raises AttributeError
Invariant (before): loader assumes path is a non-empty string
Invariant (after): loader accepts only non-empty string path; invalid type returns explicit error
Fix: add boundary validation in parse_config() and reject invalid path before use
Proof: uv run pytest tests/config/test_loader.py::test_rejects_null_path -> ok
Proof strength: targeted_regression
Compatibility impact: tightening

Residual risks / open questions

vendor/generated/config_schema.py — blocked_by=generated_output — next=edit source schema and regenerate artifact

Workflow (algorithm)

Preflight

Determine mode: review-only vs fix.
If the requested target is this skill (or another skill), route through $refine :
follow $refine Discover -> Define -> Develop -> Deliver,
apply minimal skill diffs,
run uv run --with pyyaml -- python3 codex/skills/.system/skill-creator/scripts/quick_validate.py codex/skills/<skill-name> .
if codex/skills/fix/SKILL.md was touched, also run uv run python codex/skills/fix/scripts/lint_fix_skill_contract.py codex/skills/fix/SKILL.md .
Return to the rest of fix only if the request also includes code/runtime defects.
Define slice:
If in a git repo and there is a diff, derive slice/tokens from the diff (paths, changed symbols/strings).
Otherwise: entrypoint, inputs, outputs, state.
If the request is PR-scoped (for example "$fix this PR ", "$fix current branch ", CI failure on a PR), anchor the slice to PR diff/base and keep the work PR-local unless severity widening is required.
Apply PR/diff scope guardrail for review mode.
Apply Generated / third-party code guardrail before editing.
Select validation signal (or create proof hook).
If signal/proof creation fails, stop before editing and return blocked_by=no_repro_or_proof with attempted commands.

Contract + baseline

Determine PROVEN_USED behavior:
Apply PROVEN_USED evidence checklist to the behavior tokens affected by the slice.
Derive contract without asking (in this order):
tests that exercise the slice
callsites in the repo
docs/README/examples
if none: add a characterization test for current behavior
Write contract (1 sentence): "Working means …".
Run baseline signal once; record result.
If baseline signal is ok, apply Proof discipline for passing baselines .

Create initial findings

Enumerate candidate failure modes for the slice.
Rank security > crash > corruption > logic.
For each issue you will act on, create a finding record.

Mandatory scans (multi-pass)

Run the Multi-pass loop (default) for the touched slice.

3a) Unsoundness scan

For each hazard class that applies:

Identify the first failure point.
Provide a concrete counterexample (input/timeline).
Specify the smallest sound fix that removes the bug-class.

Hazard classes:

Security boundaries: authz/authn, injection, path traversal, SSRF, unsafe deserialization, secrets/PII handling.
Nullability/uninitialized state; sentinel values.
Ownership/lifetime: leaks, double-free, use-after-close, lock not released.
Concurrency/order/time: races, lock ordering, reentrancy, timeouts/retries, TOCTOU.
Bounds/arithmetic: indexing/slicing, overflow/underflow, signedness.
Persistence/atomicity: partial writes, torn updates, non-transactional multi-step updates.
Encoding/units: bytes vs chars, timezone/locale, unit mismatches, lossy normalization.
Error handling: swallowed errors, missing context, silent fallback.

Language cues (examples only):

Rust: unsafe , unwrap/expect , as casts.
JS/TS: any , implicit coercions, undefined flows.
Python: bare except , truthiness-based branching.
C/C++: raw pointers, unchecked casts, manual malloc/free .

3b) Invariant strengthening scan

Delegate to $invariant-ace and import its artifacts into fix .

Default execution:

Run $invariant-ace Compact Mode first.
Capture at minimum:
Counterexample
Invariants
Owner and Scope
Enforcement Boundary
Seam (Before -> After)
Verification
Map these to fix finding fields:
invariant_before from broken trace + prior scope,
invariant_after from chosen predicate(s) + holds scope,
fix from seam,
proof from verification signal.
Escalate to full $invariant-ace protocol if Compact Mode does not yield an inductive predicate.

3c) Footgun scan + defusal

Trigger: you touched an API surface OR a caller can plausibly misuse the code.

For top-ranked misuse paths:

Provide minimal misuse snippet + surprising behavior.
Defuse by changing the surface:
options structs / named params
split booleans into enums or separate functions
explicit units/encodings
richer returns (no sentinels)
separate pure computation from effects
Lock with a regression test or boundary assertion.

3d) Complexity scan (risk-driven)

Delegate to $complexity-mitigator for analysis; implement only via fix .

Rule: reshape only when it reduces risk and improves auditability of invariants/ownership.

Algorithm:

Run $complexity-mitigator on the touched slice (heat read + essential/incidental verdict + ranked options + TRACE).
Select the smallest viable cut that directly supports a finding (safety/surface/invariant ownership/proof quality).
If simplification depends on an unstated invariant, run/refresh $invariant-ace first.
Keep complexity-only cleanup out of scope unless it closes a concrete risk.

Implement fixes (per finding)

For findings in severity order:

Implement the smallest sound fix that removes the bug-class.
Apply correctness tightening when allowed.
Ensure invariants + ownership hold on all paths.
Keep diff reviewable; avoid drive-by refactors.

Close the loop (required)

Run the chosen validation signal.
IF it fails:
update findings with the new counterexample,
apply the smallest additional fix,
re-run the SAME signal.
Repeat until the signal passes.
If you cannot make progress after 3 repair cycles, stop and ask with:
the last failing output (key lines/stack),
what you tried,
the smallest remaining decision that blocks you.

Residual risk sweep (required)

Re-check Residual risks / open questions policy .
Any item without a valid blocker becomes a finding and is fixed (with proof) or dropped.

Agent-directed self-review loop (required final step when a changeset exists)

Precondition gate: run this step only when Changes applied is not None and the latest validation signal result is ok .
Freeze the self-review baseline as the latest validated changeset.
Ask internally exactly: If you could change one thing about this changeset what would you change?
Summarize the self-round in Self-review loop trace with one delta row.
If the answer yields one new actionable self-review change:
convert it into one concrete finding,
treat compatible ergonomic/structural/API-shaping improvements as actionable when they materially improve the current validated changeset and can be proven locally,
widen anywhere in the current repo/worktree as needed,
apply the smallest sound change that materially addresses the critique,
re-run the chosen validation signal and update findings/proof,
record the (validated_changeset_fingerprint, normalized_answer_summary) pair so repeated suggestions cannot loop forever,
repeat from step 2.
If the answer yields only blocked changes, record stop_reason=blocked , carry blockers to Residual risks / open questions , and continue to step 8.
If the answer yields no new actionable self-review change, record stop_reason=no_new_actionable_changes and continue to step 8.
This check is against the current final validated changeset, not an earlier pre-delta review state.
already green , architectural , not fix-shaped , or no failing proof hook are not sufficient reasons by themselves when a concrete compatible/provable improvement still exists.
Run the non-self-review $fix passes once against the full resulting changeset.
If that rerun edits code, discard the stale self-review state, revalidate, and restart from step 2 against the new final changeset.
If that rerun applies no edits, exit the loop and proceed to output.
Skip gate: if Changes applied is None or the run is blocked before edits, output - None (skip_gate) in Self-review loop trace and proceed to output.

7a) Post-fix boundary (required)

If a clean or closed $fix pass surfaces broader non-fix opportunities, do not continue exploring them under $fix .
If the user explicitly asks for broader or deeper follow-up after closure, recommend the next skill explicitly:
$parse for architecture or purpose analysis
$grill-me for interrogation, pressure-testing, or narrowing the next move
$plan for a decision-complete roadmap
$creative-problem-solver for wider option generation
Do not place those broader opportunities in Residual risks / open questions unless a valid blocker from the allowed set applies.

Output lock (required)

Before sending the final message, verify all are true:

Heading set is exact and complete (Contract , Findings (severity order) , Changes applied , Pass trace , Validation , Self-review loop trace , Residual risks / open questions ).
Pass trace includes planned/executed counts and P1/P2/P3 lines (plus P4/P5 when executed).
Runtime pass updates (Pass <n>/<total_planned>: ... ) were emitted during execution.
If embedded mode was used, include Fix Record in the same assistant message (after any required artifact).
Validation includes machine-checkable keys: baseline_cmd , baseline_result , proof_hook , final_cmd , final_result .
Every acted-on finding includes Proof strength and Compatibility impact using the allowed enums.
Every residual blocked_by uses only: product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable .
If Changes applied is not None AND the latest validation result is ok , Self-review loop trace includes a terminal row with stop_reason=<no_new_actionable_changes|blocked> and the user-facing final line is not the question.

Deliverable format (chat)

Output exactly these sections in this order.

If no findings:

Findings (severity order): None .
Changes applied: None .
Self-review loop trace: - None (skip_gate) .
Residual risks / open questions: - None .
Still include Pass trace and Validation.

Contract

Findings (severity order) For each finding:

F# <file:line> — <security|crash|corruption|logic> —
Surface: Tokens=<...>; PROVEN_USED=<yes/no + evidence>; External=<yes/no>; Diff_touch=<yes/no>
Counterexample: <input/timeline>
Invariant (before): <what was assumed/allowed>
Invariant (after): <what is now guaranteed/rejected>
Fix:
Proof: <test/assert/log + validation command> -> <ok/fail>
Proof strength: <characterization|targeted_regression|property_or_fuzz>
Compatibility impact: <none|tightening|additive|breaking>

Changes applied

Pass trace

Core passes planned: 3 ; core passes executed: <3>
Delta passes planned: <0|2> ; delta passes executed: <0|2>
Total passes executed: <3|5>
P1 Safety -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
P2 Surface -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
P3 Audit -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
If delta passes executed, also include P4 and P5 lines in the same format.

Validation

-> <ok/fail>
{"baseline_cmd":"<cmd|n/a>","baseline_result":"<ok|fail|n/a>","proof_hook":"<test/assert/log|n/a>","final_cmd":"<cmd>","final_result":"<ok|fail>"} (single-line JSON)

Self-review loop trace

If none: - None (skip_gate)
Otherwise: S# prompt=If you could change one thing about this changeset what would you change? ; answer_summary=<...>; finding=<F#|none> ; change_applied=<yes|no> ; proof=<cmd|n/a> ; result=<ok|fail|n/a> ; stop_reason=<continue|no_new_actionable_changes|blocked>

Residual risks / open questions

If none: - None
Otherwise: - <file:line or token> — blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> — next=<one action>

Fix Record (embedded mode only)

Use only when $fix is invoked inside another skill.

Findings (severity order) For each finding:

F# <file:line> — <security|crash|corruption|logic> —
Surface: Tokens=<...>; PROVEN_USED=<yes/no + evidence>; External=<yes/no>; Diff_touch=<yes/no>
Counterexample: <input/timeline>
Invariant (before): <what was assumed/allowed>
Invariant (after): <what is now guaranteed/rejected>
Fix:
Proof: <test/assert/log + validation command> -> <ok/fail>
Proof strength: <characterization|targeted_regression|property_or_fuzz>
Compatibility impact: <none|tightening|additive|breaking>

Changes applied

Pass trace

Core passes planned: 3 ; core passes executed: <3>
Delta passes planned: <0|2> ; delta passes executed: <0|2>
Total passes executed: <3|5>
P1 Safety -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
P2 Surface -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
P3 Audit -> <done> ; edits=<yes|no> ; signal=<cmd|n/a> ; result=<ok|fail|n/a>
If delta passes executed, also include P4 and P5 lines in the same format.

Validation

-> <ok/fail>
{"baseline_cmd":"<cmd|n/a>","baseline_result":"<ok|fail|n/a>","proof_hook":"<test/assert/log|n/a>","final_cmd":"<cmd>","final_result":"<ok|fail>"} (single-line JSON)

Self-review loop trace

If none: - None (skip_gate)
Otherwise: S# prompt=If you could change one thing about this changeset what would you change? ; answer_summary=<...>; finding=<F#|none> ; change_applied=<yes|no> ; proof=<cmd|n/a> ; result=<ok|fail|n/a> ; stop_reason=<continue|no_new_actionable_changes|blocked>

Residual risks / open questions

If none: - None
Otherwise: - <file:line or token> — blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> — next=<one action>

Pitfalls

Vague advice without locations.
"Nice-to-have" refactors that don’t reduce risk.
Feature creep.
Continuing broad repo critique or planning under a completed $fix label.
Asking for risk tolerance instead of applying the default policy.
Tightening presented as optional.
Ownership fixes without proven allocator/owner.
Editing generated/third-party outputs instead of source-of-truth.
Code edits without a failing proof hook when baseline was ok.
Using Residual risks / open questions as a substitute for fixable findings.

Activation cues

"resolve" / "fix" / "crash" / "data corruption"
"$fix this PR" / "$fix current branch" / "fix this branch"
"CI failed" / "fix the red checks" / "repair failing PR"
"footgun" / "misuse" / "should never happen"
"invariant" / "lifetime" / "nullable surprise"
"too complex" / "branch soup" / "cross-file hops"
"refine $fix" / "improve fix skill workflow" / "update fix skill docs"

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

grill-me

complexity-mitigator

mesh

refine