Self-Reflection

Mine memories and remediations for behavior patterns, surface findings to user, remediate docs with pressure-tested improvements.

Core loop: Search -> Report -> User prioritizes -> Brainstorm -> Pressure test -> Apply

When to Use

Periodic review of agent behavior patterns
After series of failures or poor outcomes
Before major project milestones
When CLAUDE.md feels stale or incomplete
To check ReasoningBank health

When NOT to Use

Immediate error diagnosis -> use contextd-workflow skill
Recording a single learning -> use /remember
Checkpoint management -> use contextd-workflow skill

Behavioral Taxonomy

Focus on agent behaviors, not technical failures:

Behavior Type Description Examples

rationalized-skip Justified skipping required step "too simple to test", "user implied consent"

overclaimed Absolute language inappropriately "ensures", "guarantees", "production ready"

ignored-instruction Didn't follow CLAUDE.md/skill Skipped contextd search, ignored TDD

assumed-context Assumed without verification Assumed permission, requirements, state

undocumented-decision Significant choice without rationale Changed architecture without comparison

Severity Overlay

Severity Combination

CRITICAL rationalized-skip + destructive/security operation

HIGH rationalized-skip + validation skip, ignored-instruction

MEDIUM overclaimed, assumed-context

LOW undocumented-decision, style issues

The Report

For each finding, surface:

Behavior Type - Which taxonomy category
Severity - CRITICAL/HIGH/MEDIUM/LOW
Evidence - Memory/remediation IDs with excerpts
Violated Instruction - The skill/command/CLAUDE.md section ignored
Suggested Fix - Target doc and proposed change
Pressure Scenario - Test case from real failure

Remediation Flow

Behavioral Search Queries

Rationalized skips

memory_search("skip OR skipped OR bypass OR ignored") memory_search("too simple OR trivial OR obvious")

User feedback indicating ignored instructions

memory_search("why did you OR should have OR forgot to")

Assumptions without verification

memory_search("assumed OR without checking")

Overclaiming

memory_search("ensures OR guarantees OR production ready")

Filter out technical bugs: Exclude memories with error:* tags or stack traces.

ReasoningBank Health

--health flag analyzes:

Memory quality: feedback rate, confidence distribution
Tag hygiene: inconsistent tags needing consolidation
Stale content: old memories without feedback
Remediation completeness: missing fields

Quick Reference

Action Command

Full report /reflect

Health only /reflect --health

Apply fixes /reflect --apply

Recent only /reflect --since=7d

Filter by behavior /reflect --behavior=rationalized-skip

Filter by severity /reflect --severity=HIGH

Anti-Patterns

Mistake Why It Fails

Skipping pressure tests "Fixed" docs don't actually prevent behavior

Modifying plugin source Breaks on update; use includes

Auto-applying security fixes High-stakes changes need review

Ignoring frequency 10 TDD skips is systemic, not minor

Absolute claims in fixes "This prevents X" -> "This helps reduce X"

Causal Chain Analysis

Root Cause Tracing

Go beyond symptoms to find root causes:

{ "finding_id": "ref_001", "behavior": "rationalized-skip", "symptom": "Skipped tests before claiming fix complete", "causal_chain": [ { "level": 1, "cause": "Agent claimed fix without running tests", "evidence": ["mem_123", "mem_124"] }, { "level": 2, "cause": "CLAUDE.md test instruction buried in long section", "evidence": ["claude_md_line_245"] }, { "level": 3, "cause": "No PreToolUse hook enforcing test requirement", "evidence": ["hooks.json missing enforcement"] } ], "root_cause": "Missing automated enforcement of test-before-fix policy", "fix_target": "hooks.json + CLAUDE.md restructure" }

Chain Depth Levels

Level Description Fix Location

1 Immediate behavior Agent prompt/skill

2 Missing guidance CLAUDE.md/documentation

3 Missing enforcement Hooks/automation

4 Systemic gap Plugin/skill redesign

Multi-Incident Correlation

Find patterns across incidents:

causal_correlate(findings: [ref_001, ref_002, ref_003])

Returns: shared_root_causes: [ { cause: "Missing hook enforcement", incidents: [ref_001, ref_002] }, { cause: "Ambiguous CLAUDE.md section", incidents: [ref_002, ref_003] } ] recommended_fixes: [ { target: "hooks.json", impact: "high", fixes_incidents: 2 } ]

Comparative Benchmarks

Behavior Metrics Over Time

Track improvement (or regression):

{ "benchmark_period": "2026-01-01 to 2026-01-28", "metrics": { "rationalized_skip": { "count": 5, "previous_period": 12, "trend": "improving", "change_pct": -58 }, "ignored_instruction": { "count": 8, "previous_period": 6, "trend": "regressing", "change_pct": +33 }, "assumed_context": { "count": 3, "previous_period": 3, "trend": "stable", "change_pct": 0 } } }

Benchmark Categories

Metric Target Good Warning Critical

rationalized_skip/week 0 < 2 2-5

5

ignored_instruction/week 0 < 3 3-7

7

overclaimed/week 0 < 5 5-10

10

test_coverage_skip 0% < 5% 5-15%

15%

Comparative Reports

/reflect --benchmark --compare-periods "2026-01" "2025-12"

Output:

Behavior	Dec 2025	Jan 2026	Change
rationalized-skip	12	5	-58%
ignored-instruction	6	8	+33%

Top Improvement: Hook enforcement reduced skips Top Regression: New skills lack CLAUDE.md entries

Behavioral Prediction

Pattern-Based Prediction

Predict likely future failures based on patterns:

{ "prediction": { "behavior": "rationalized-skip", "likelihood": 0.75, "conditions": [ "Complex task with > 5 sub-steps", "Time pressure mentioned in prompt", "No explicit test requirement in task" ], "historical_basis": ["mem_101", "mem_102", "mem_103"], "prevention": "Add explicit test checkpoint to complex task prompts" } }

Risk Factors

Factor Risk Increase Mitigation

Task complexity > 5 steps +40% skip risk Explicit checkpoints

"Quick fix" language +60% skip risk Reject quick-fix framing

No acceptance criteria +50% assumption risk Require criteria

Security-adjacent code +30% overclaim risk Require review

Predictive Alerts

{ "alert": "high_risk_task_detected", "task_description": "Quick fix for authentication bug", "risk_factors": ["quick_fix_language", "security_adjacent"], "predicted_behaviors": ["rationalized-skip", "assumed-context"], "recommended_guardrails": [ "Require explicit test plan before starting", "Trigger consensus-review before merge" ] }

Intervention Hooks

Auto-intervene when risk detected:

{ "hook_type": "PreToolUse", "tool_name": "Edit", "condition": "file_path.contains('auth') AND prediction.skip_risk > 0.5", "prompt": "High skip risk detected for security code. Before editing, confirm: 1) Tests exist 2) Review planned 3) No assumptions about user state" }

Unified Memory Type References

Tag reflection findings with standard types:

Finding Type Tag Purpose

Behavior pattern type:pattern , category:behavior

Track patterns

Root cause type:decision , category:analysis

Document cause

Fix proposal type:learning , category:improvement

Capture fix

Regression type:failure , category:regression

Track setbacks

Policy update type:policy , category:enforcement

New rules

Hierarchical Namespace Guidance

Reflection Namespaces

<org>/<project>/reflections/<reflection_id>

Examples: fyrsmithlabs/contextd/reflections/2026-01-weekly fyrsmithlabs/marketplace/reflections/v1.6-pre-release

Finding Namespaces

<reflection_namespace>/findings/<finding_id>

Example: fyrsmithlabs/contextd/reflections/2026-01-weekly/findings/ref_001

Audit Fields

All reflection records include:

Field Description Auto-set

created_by

Reflection agent/session Yes

created_at

Analysis timestamp Yes

period_start

Analysis period start Yes

period_end

Analysis period end Yes

memory_count

Memories analyzed Yes

finding_count

Findings generated Yes

remediation_count

Fixes applied Yes

Claude Code 2.1 Patterns

Background Analysis

Run reflection analysis without blocking:

Task( subagent_type: "general-purpose", prompt: "Analyze memories for behavior patterns over past 7 days", run_in_background: true, description: "Background reflection analysis" )

// Continue other work... // Collect results later: TaskOutput(task_id, block: true)

Task Dependencies for Reflection Flow

Chain reflection phases:

search_task = Task(prompt: "Search memories for behavior patterns") analyze_task = Task(prompt: "Analyze patterns, build causal chains", addBlockedBy: [search_task.id]) benchmark_task = Task(prompt: "Compare to previous period", addBlockedBy: [analyze_task.id]) predict_task = Task(prompt: "Generate predictions", addBlockedBy: [analyze_task.id]) report_task = Task(prompt: "Synthesize report", addBlockedBy: [benchmark_task.id, predict_task.id])

PreToolUse Hook for High-Risk Detection

Auto-alert on predicted risky operations:

{ "hook_type": "PreToolUse", "tool_name": "Edit|Bash", "condition": "prediction_model.risk_score > 0.7", "prompt": "High-risk operation predicted. Review risk factors and confirm guardrails are in place before proceeding." }

PostToolUse Hook for Pattern Recording

Auto-record behavior patterns:

{ "hook_type": "PostToolUse", "tool_name": "Task", "condition": "task_description.contains('reflection')", "prompt": "Reflection complete. Record findings to memory with type:pattern tags. Update benchmarks." }

Event-Driven State Sharing

Self-reflection emits events for other skills:

{ "event": "reflection_complete", "payload": { "reflection_id": "2026-01-weekly", "findings_count": 12, "critical_count": 1, "high_count": 3, "trend": "improving", "top_behavior": "rationalized-skip" }, "notify": ["setup", "workflow", "orchestration"] }

Subscribe to reflection events:

reflection_started
Analysis began
reflection_complete
Analysis finished
critical_finding
CRITICAL behavior detected
regression_detected
Metrics worsening
benchmark_updated
New baseline recorded
prediction_generated
Risk prediction available
intervention_triggered
Auto-guardrail activated

self-reflection

Safety Notice

Copy this and send it to your AI assistant to learn

Rationalized skips

User feedback indicating ignored instructions

Assumptions without verification

Overclaiming

Source Transparency

Related Skills

product-owner

check

init

RentaUnHumano MCP