Self-Improving Agent Skill
An advanced learning system that turns Claude Code sessions into reusable knowledge through atomic "instincts" - small learned behaviors with confidence scoring and project scope isolation.
v2.1 adds project-scoped instincts — React patterns stay in your React project, Python conventions stay in your Python project, and universal patterns are shared globally.
Quick Reference
| Situation | Action |
|---|---|
| Command/operation fails | Log instinct or v1 learning |
| User corrects you | Create instinct with correction trigger |
| Discovering patterns | Log instinct with confidence score |
| Review learned behaviors | /instinct-status |
| Evolve instincts to skills | /evolve |
| Promote project → global | /promote |
| Setup observation hooks | Enable PreToolUse/PostToolUse hooks |
Two Learning Modes
Mode 1: Instinct-Based (v2) - RECOMMENDED
Atomic, confidence-weighted behaviors with project isolation:
---
id: prefer-functional-style
trigger: "when writing new functions"
confidence: 0.7
domain: "code-style"
scope: project
project_id: "a1b2c3d4e5f6"
---
# Prefer Functional Style
## Action
Use functional patterns over classes when appropriate.
## Evidence
- Observed 5 instances of functional pattern preference
- User corrected class-based approach on 2025-01-15
Mode 2: Markdown-Based (v1) - LEGACY
Traditional learning entries for complex, narrative learnings:
## [LRN-YYYYMMDD-XXX] category
**Priority**: high | **Status**: pending | **Area**: backend
### Summary
Detailed description of what was learned
### Details
Full context and explanation
Use v2 (instincts) for behavioral patterns, v1 (markdown) for complex incident analysis.
Instinct-Based Learning (v2)
The Instinct Model
An instinct is a small, atomic learned behavior:
Properties:
- Atomic — one trigger, one action
- Confidence-weighted — 0.3 = tentative, 0.9 = near certain
- Domain-tagged — code-style, testing, git, debugging, workflow, security, etc.
- Evidence-backed — tracks what observations created it
- Scope-aware —
project(default) orglobal
Confidence Scoring
| Score | Meaning | Behavior |
|---|---|---|
| 0.3 | Tentative | Suggested but not enforced |
| 0.5 | Moderate | Applied when relevant |
| 0.7 | Strong | Auto-approved for application |
| 0.9 | Near-certain | Core behavior |
Confidence increases when:
- Pattern is repeatedly observed
- User doesn't correct the suggested behavior
- Similar instincts from other sources agree
Confidence decreases when:
- User explicitly corrects the behavior
- Pattern isn't observed for extended periods
- Contradicting evidence appears
Scope Decision Guide
| Pattern Type | Scope | Examples |
|---|---|---|
| Language/framework conventions | project | "Use React hooks", "Follow Django REST patterns" |
| File structure preferences | project | "Tests in __tests__/", "Components in src/components/" |
| Code style | project | "Use functional style", "Prefer dataclasses" |
| Security practices | global | "Validate user input", "Sanitize SQL" |
| General best practices | global | "Write tests first", "Always handle errors" |
| Tool workflow preferences | global | "Grep before Edit", "Read before Write" |
| Git practices | global | "Conventional commits", "Small focused commits" |
Project Detection
The system automatically detects your current project:
CLAUDE_PROJECT_DIRenv var (highest priority)git remote get-url origin— hashed to create a portable project IDgit rev-parse --show-toplevel— fallback using repo path- Global fallback — if no project detected, instincts go to global scope
Each project gets a 12-character hash ID (e.g., a1b2c3d4e5f6).
v2 Commands
| Command | Description |
|---|---|
/instinct-status | Show all instincts (project-scoped + global) with confidence |
/evolve | Cluster related instincts into skills/commands, suggest promotions |
/instinct-export | Export instincts (filterable by scope/domain) |
/instinct-import <file> | Import instincts with scope control |
/promote [id] | Promote project instincts to global scope |
/projects | List all known projects and their instinct counts |
/instinct-status Example
Project: my-react-app (a1b2c3d4e5f6)
├─ prefer-functional-style.yaml (0.7) [project]
├─ use-react-hooks.yaml (0.9) [project]
└─ jest-testing-patterns.yaml (0.6) [project]
Global Instincts:
├─ always-validate-input.yaml (0.85) [global]
├─ grep-before-edit.yaml (0.6) [global]
└─ conventional-commits.yaml (0.75) [global]
/evolve Workflow
Clusters related instincts and generates:
- Skills — domain-specific workflows
- Commands — slash commands for common tasks
- Agents — specialized sub-agents
/evolve
# Analyzes instincts and suggests:
# - "Create skill: react-testing-workflow.md"
# - "Create command: /test-component"
# - "Promote prefer-functional-style to global (seen in 3 projects)"
/promote Workflow
Promote project-scoped instincts to global when proven across projects:
/promote prefer-explicit-errors
# Promotes the instinct from current project to global scope
Auto-promotion criteria:
- Same instinct ID in 2+ projects
- Average confidence >= 0.8
File Structure (v2)
~/.claude/homunculus/
├── identity.json # Your profile, technical level
├── projects.json # Registry: project hash → name/path/remote
├── observations.jsonl # Global observations (fallback)
├── instincts/
│ ├── personal/ # Global auto-learned instincts
│ └── inherited/ # Global imported instincts
├── evolved/
│ ├── agents/ # Global generated agents
│ ├── skills/ # Global generated skills
│ └── commands/ # Global generated commands
└── projects/
├── a1b2c3d4e5f6/ # Project hash
│ ├── observations.jsonl
│ ├── observations.archive/
│ ├── instincts/
│ │ ├── personal/ # Project-specific auto-learned
│ │ └── inherited/ # Project-specific imported
│ └── evolved/
│ ├── skills/
│ ├── commands/
│ └── agents/
└── f6e5d4c3b2a1/ # Another project
Enabling Observation Hooks (v2)
Add to your ~/.claude/settings.json:
{
"hooks": {
"PreToolUse": [{
"matcher": "*",
"hooks": [{
"type": "command",
"command": "~/.claude/skills/self-improving-agent/hooks/observe.sh"
}]
}],
"PostToolUse": [{
"matcher": "*",
"hooks": [{
"type": "command",
"command": "~/.claude/skills/self-improving-agent/hooks/observe.sh"
}]
}]
}
}
Why hooks? Hooks fire 100% of the time, deterministically. Skills fire ~50-80% based on Claude's judgment.
OpenClaw is the primary platform for this skill. It uses workspace-based prompt injection with automatic skill loading.
Installation
Via ClawdHub (recommended):
clawdhub install self-improving-agent
Manual:
git clone https://github.com/peterskoett/self-improving-agent.git ~/.openclaw/skills/self-improving-agent
Remade for openclaw from original repo : https://github.com/pskoett/pskoett-ai-skills - https://github.com/pskoett/pskoett-ai-skills/tree/main/skills/self-improvement
Workspace Structure
OpenClaw injects these files into every session:
~/.openclaw/workspace/
├── AGENTS.md # Multi-agent workflows, delegation patterns
├── SOUL.md # Behavioral guidelines, personality, principles
├── TOOLS.md # Tool capabilities, integration gotchas
├── MEMORY.md # Long-term memory (main session only)
├── memory/ # Daily memory files
│ └── YYYY-MM-DD.md
└── .learnings/ # This skill's log files
├── LEARNINGS.md
├── ERRORS.md
└── FEATURE_REQUESTS.md
Create Learning Files
mkdir -p ~/.openclaw/workspace/.learnings
Then create the log files (or copy from assets/):
LEARNINGS.md— corrections, knowledge gaps, best practicesERRORS.md— command failures, exceptionsFEATURE_REQUESTS.md— user-requested capabilities
Promotion Targets
When learnings prove broadly applicable, promote them to workspace files:
| Learning Type | Promote To | Example |
|---|---|---|
| Behavioral patterns | SOUL.md | "Be concise, avoid disclaimers" |
| Workflow improvements | AGENTS.md | "Spawn sub-agents for long tasks" |
| Tool gotchas | TOOLS.md | "Git push needs auth configured first" |
Inter-Session Communication
OpenClaw provides tools to share learnings across sessions:
- sessions_list — View active/recent sessions
- sessions_history — Read another session's transcript
- sessions_send — Send a learning to another session
- sessions_spawn — Spawn a sub-agent for background work
Optional: Enable Hook
For automatic reminders at session start:
# Copy hook to OpenClaw hooks directory
cp -r hooks/openclaw ~/.openclaw/hooks/self-improvement
# Enable it
openclaw hooks enable self-improvement
See references/openclaw-integration.md for complete details.
Generic Setup (Other Agents)
For Claude Code, Codex, Copilot, or other agents, create .learnings/ in your project:
mkdir -p .learnings
Copy templates from assets/ or create files with headers.
Add reference to agent files AGENTS.md, CLAUDE.md, or .github/copilot-instructions.md to remind yourself to log learnings. (this is an alternative to hook-based reminders)
Self-Improvement Workflow
When errors or corrections occur:
- Log to
.learnings/ERRORS.md,LEARNINGS.md, orFEATURE_REQUESTS.md - Review and promote broadly applicable learnings to:
CLAUDE.md- project facts and conventionsAGENTS.md- workflows and automation.github/copilot-instructions.md- Copilot context
Logging Formats
v2: Instinct Format (RECOMMENDED for behavioral patterns)
Create atomic instinct files in ~/.claude/homunculus/instincts/personal/ or project-scoped:
---
id: unique-instinct-id
trigger: "when to apply this instinct"
confidence: 0.7
domain: "code-style|testing|git|debugging|workflow|security|infra"
source: "session-observation|user-correction|pattern-detection"
scope: "project|global"
project_id: "a1b2c3d4e5f6" # if scope: project
project_name: "my-project"
created_at: "2025-01-15T10:00:00Z"
updated_at: "2025-01-15T10:00:00Z"
evidence_count: 3
---
# Instinct Title
## Action
What to do when triggered.
## Rationale
Why this behavior is preferred.
## Examples
### Positive
```typescript
// Good example
Negative
// Bad example
Evidence
- Observed 3 instances of this pattern
- User corrected opposite approach on 2025-01-10
**File naming:** `~/.claude/homunculus/instincts/personal/{instinct-id}.yaml`
### v1: Markdown Format (for complex learnings)
#### Learning Entry
Append to `.learnings/LEARNINGS.md`:
```markdown
## [LRN-YYYYMMDD-XXX] category
**Logged**: ISO-8601 timestamp
**Priority**: low | medium | high | critical
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config
### Summary
One-line description of what was learned
### Details
Full context: what happened, what was wrong, what's correct
### Suggested Action
Specific fix or improvement to make
### Metadata
- Source: conversation | error | user_feedback | simplify-and-harden
- Related Files: path/to/file.ext
- Tags: tag1, tag2
- See Also: LRN-20250110-001
- Pattern-Key: simplify.dead_code | harden.input_validation
---
v2 vs v1 Comparison
| Feature | v1 (Markdown) | v2 (Instincts) |
|---|---|---|
| Granularity | Full skills | Atomic "instincts" |
| Confidence | None | 0.3-0.9 weighted |
| Scope | Global only | Project-scoped + global |
| Observation | Stop hook (session end) | PreToolUse/PostToolUse (100% reliable) |
| Analysis | Main context | Background agent (Haiku) |
| Evolution | Direct to skill | Instincts → cluster → skill/command/agent |
| Sharing | None | Export/import instincts |
| Best for | Complex incidents | Behavioral patterns |
Migration from v1 to v2
For existing v1 users: v2 is fully backward compatible:
- Existing global instincts still work
- Existing
.learnings/*.mdfiles still work - Gradual migration: run both in parallel
Recommended approach:
- Start using v2 instincts for new behavioral patterns
- Keep v1 markdown for complex incident analysis
- Use
/evolveto convert related v1 learnings into v2 instincts - Promote high-confidence instincts to skills
Error Entry
Append to .learnings/ERRORS.md:
## [ERR-YYYYMMDD-XXX] skill_or_command_name
**Logged**: ISO-8601 timestamp
**Priority**: high
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config
### Summary
Brief description of what failed
### Error
Actual error message or output
### Context
- Command/operation attempted
- Input or parameters used
- Environment details if relevant
### Suggested Fix
If identifiable, what might resolve this
### Metadata
- Reproducible: yes | no | unknown
- Related Files: path/to/file.ext
- See Also: ERR-20250110-001 (if recurring)
---
Feature Request Entry
Append to .learnings/FEATURE_REQUESTS.md:
## [FEAT-YYYYMMDD-XXX] capability_name
**Logged**: ISO-8601 timestamp
**Priority**: medium
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config
### Requested Capability
What the user wanted to do
### User Context
Why they needed it, what problem they're solving
### Complexity Estimate
simple | medium | complex
### Suggested Implementation
How this could be built, what it might extend
### Metadata
- Frequency: first_time | recurring
- Related Features: existing_feature_name
---
ID Generation
Format: TYPE-YYYYMMDD-XXX
- TYPE:
LRN(learning),ERR(error),FEAT(feature) - YYYYMMDD: Current date
- XXX: Sequential number or random 3 chars (e.g.,
001,A7B)
Examples: LRN-20250115-001, ERR-20250115-A3F, FEAT-20250115-002
Resolving Entries
When an issue is fixed, update the entry:
- Change
**Status**: pending→**Status**: resolved - Add resolution block after Metadata:
### Resolution
- **Resolved**: 2025-01-16T09:00:00Z
- **Commit/PR**: abc123 or #42
- **Notes**: Brief description of what was done
Other status values:
in_progress- Actively being worked onwont_fix- Decided not to address (add reason in Resolution notes)promoted- Elevated to CLAUDE.md, AGENTS.md, or .github/copilot-instructions.md
Promoting to Project Memory
When a learning is broadly applicable (not a one-off fix), promote it to permanent project memory.
When to Promote
- Learning applies across multiple files/features
- Knowledge any contributor (human or AI) should know
- Prevents recurring mistakes
- Documents project-specific conventions
Promotion Targets
| Target | What Belongs There |
|---|---|
CLAUDE.md | Project facts, conventions, gotchas for all Claude interactions |
AGENTS.md | Agent-specific workflows, tool usage patterns, automation rules |
.github/copilot-instructions.md | Project context and conventions for GitHub Copilot |
SOUL.md | Behavioral guidelines, communication style, principles (OpenClaw workspace) |
TOOLS.md | Tool capabilities, usage patterns, integration gotchas (OpenClaw workspace) |
How to Promote
- Distill the learning into a concise rule or fact
- Add to appropriate section in target file (create file if needed)
- Update original entry:
- Change
**Status**: pending→**Status**: promoted - Add
**Promoted**: CLAUDE.md,AGENTS.md, or.github/copilot-instructions.md
- Change
Promotion Examples
Learning (verbose):
Project uses pnpm workspaces. Attempted
npm installbut failed. Lock file ispnpm-lock.yaml. Must usepnpm install.
In CLAUDE.md (concise):
## Build & Dependencies
- Package manager: pnpm (not npm) - use `pnpm install`
Learning (verbose):
When modifying API endpoints, must regenerate TypeScript client. Forgetting this causes type mismatches at runtime.
In AGENTS.md (actionable):
## After API Changes
1. Regenerate client: `pnpm run generate:api`
2. Check for type errors: `pnpm tsc --noEmit`
Recurring Pattern Detection
If logging something similar to an existing entry:
- Search first:
grep -r "keyword" .learnings/ - Link entries: Add
**See Also**: ERR-20250110-001in Metadata - Bump priority if issue keeps recurring
- Consider systemic fix: Recurring issues often indicate:
- Missing documentation (→ promote to CLAUDE.md or .github/copilot-instructions.md)
- Missing automation (→ add to AGENTS.md)
- Architectural problem (→ create tech debt ticket)
Simplify & Harden Feed
Use this workflow to ingest recurring patterns from the simplify-and-harden
skill and turn them into durable prompt guidance.
Ingestion Workflow
- Read
simplify_and_harden.learning_loop.candidatesfrom the task summary. - For each candidate, use
pattern_keyas the stable dedupe key. - Search
.learnings/LEARNINGS.mdfor an existing entry with that key:grep -n "Pattern-Key: <pattern_key>" .learnings/LEARNINGS.md
- If found:
- Increment
Recurrence-Count - Update
Last-Seen - Add
See Alsolinks to related entries/tasks
- Increment
- If not found:
- Create a new
LRN-...entry - Set
Source: simplify-and-harden - Set
Pattern-Key,Recurrence-Count: 1, andFirst-Seen/Last-Seen
- Create a new
Promotion Rule (System Prompt Feedback)
Promote recurring patterns into agent context/system prompt files when all are true:
Recurrence-Count >= 3- Seen across at least 2 distinct tasks
- Occurred within a 30-day window
Promotion targets:
CLAUDE.mdAGENTS.md.github/copilot-instructions.mdSOUL.md/TOOLS.mdfor OpenClaw workspace-level guidance when applicable
Write promoted rules as short prevention rules (what to do before/while coding), not long incident write-ups.
Periodic Review
Review .learnings/ at natural breakpoints:
When to Review
- Before starting a new major task
- After completing a feature
- When working in an area with past learnings
- Weekly during active development
Quick Status Check
# Count pending items
grep -h "Status\*\*: pending" .learnings/*.md | wc -l
# List pending high-priority items
grep -B5 "Priority\*\*: high" .learnings/*.md | grep "^## \["
# Find learnings for a specific area
grep -l "Area\*\*: backend" .learnings/*.md
Review Actions
- Resolve fixed items
- Promote applicable learnings
- Link related entries
- Escalate recurring issues
Detection Triggers
Automatically log when you notice:
Corrections (→ learning with correction category):
- "No, that's not right..."
- "Actually, it should be..."
- "You're wrong about..."
- "That's outdated..."
Feature Requests (→ feature request):
- "Can you also..."
- "I wish you could..."
- "Is there a way to..."
- "Why can't you..."
Knowledge Gaps (→ learning with knowledge_gap category):
- User provides information you didn't know
- Documentation you referenced is outdated
- API behavior differs from your understanding
Errors (→ error entry):
- Command returns non-zero exit code
- Exception or stack trace
- Unexpected output or behavior
- Timeout or connection failure
Priority Guidelines
| Priority | When to Use |
|---|---|
critical | Blocks core functionality, data loss risk, security issue |
high | Significant impact, affects common workflows, recurring issue |
medium | Moderate impact, workaround exists |
low | Minor inconvenience, edge case, nice-to-have |
Area Tags
Use to filter learnings by codebase region:
| Area | Scope |
|---|---|
frontend | UI, components, client-side code |
backend | API, services, server-side code |
infra | CI/CD, deployment, Docker, cloud |
tests | Test files, testing utilities, coverage |
docs | Documentation, comments, READMEs |
config | Configuration files, environment, settings |
Best Practices
- Log immediately - context is freshest right after the issue
- Be specific - future agents need to understand quickly
- Include reproduction steps - especially for errors
- Link related files - makes fixes easier
- Suggest concrete fixes - not just "investigate"
- Use consistent categories - enables filtering
- Promote aggressively - if in doubt, add to CLAUDE.md or .github/copilot-instructions.md
- Review regularly - stale learnings lose value
Gitignore Options
Keep learnings local (per-developer):
.learnings/
Track learnings in repo (team-wide): Don't add to .gitignore - learnings become shared knowledge.
Hybrid (track templates, ignore entries):
.learnings/*.md
!.learnings/.gitkeep
Hook Integration
Enable automatic reminders through agent hooks. This is opt-in - you must explicitly configure hooks.
Quick Setup (Claude Code / Codex)
Create .claude/settings.json in your project:
{
"hooks": {
"UserPromptSubmit": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}]
}]
}
}
This injects a learning evaluation reminder after each prompt (~50-100 tokens overhead).
Full Setup (With Error Detection)
{
"hooks": {
"UserPromptSubmit": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}]
}],
"PostToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "./skills/self-improvement/scripts/error-detector.sh"
}]
}]
}
}
Available Hook Scripts
| Script | Hook Type | Purpose |
|---|---|---|
scripts/activator.sh | UserPromptSubmit | Reminds to evaluate learnings after tasks |
scripts/error-detector.sh | PostToolUse (Bash) | Triggers on command errors |
See references/hooks-setup.md for detailed configuration and troubleshooting.
Automatic Skill Extraction
When a learning is valuable enough to become a reusable skill, extract it using the provided helper.
Skill Extraction Criteria
A learning qualifies for skill extraction when ANY of these apply:
| Criterion | Description |
|---|---|
| Recurring | Has See Also links to 2+ similar issues |
| Verified | Status is resolved with working fix |
| Non-obvious | Required actual debugging/investigation to discover |
| Broadly applicable | Not project-specific; useful across codebases |
| User-flagged | User says "save this as a skill" or similar |
Extraction Workflow
- Identify candidate: Learning meets extraction criteria
- Run helper (or create manually):
./skills/self-improvement/scripts/extract-skill.sh skill-name --dry-run ./skills/self-improvement/scripts/extract-skill.sh skill-name - Customize SKILL.md: Fill in template with learning content
- Update learning: Set status to
promoted_to_skill, addSkill-Path - Verify: Read skill in fresh session to ensure it's self-contained
Manual Extraction
If you prefer manual creation:
- Create
skills/<skill-name>/SKILL.md - Use template from
assets/SKILL-TEMPLATE.md - Follow Agent Skills spec:
- YAML frontmatter with
nameanddescription - Name must match folder name
- No README.md inside skill folder
- YAML frontmatter with
Extraction Detection Triggers
Watch for these signals that a learning should become a skill:
In conversation:
- "Save this as a skill"
- "I keep running into this"
- "This would be useful for other projects"
- "Remember this pattern"
In learning entries:
- Multiple
See Alsolinks (recurring issue) - High priority + resolved status
- Category:
best_practicewith broad applicability - User feedback praising the solution
Skill Quality Gates
Before extraction, verify:
- Solution is tested and working
- Description is clear without original context
- Code examples are self-contained
- No project-specific hardcoded values
- Follows skill naming conventions (lowercase, hyphens)
Multi-Agent Support
This skill works across different AI coding agents with agent-specific activation.
Claude Code
Activation: Hooks (UserPromptSubmit, PostToolUse)
Setup: .claude/settings.json with hook configuration
Detection: Automatic via hook scripts
Codex CLI
Activation: Hooks (same pattern as Claude Code)
Setup: .codex/settings.json with hook configuration
Detection: Automatic via hook scripts
GitHub Copilot
Activation: Manual (no hook support)
Setup: Add to .github/copilot-instructions.md:
## Self-Improvement
After solving non-obvious issues, consider logging to `.learnings/`:
1. Use format from self-improvement skill
2. Link related entries with See Also
3. Promote high-value learnings to skills
Ask in chat: "Should I log this as a learning?"
Detection: Manual review at session end
OpenClaw
Activation: Workspace injection + inter-agent messaging Setup: See "OpenClaw Setup" section above Detection: Via session tools and workspace files
Agent-Agnostic Guidance
Regardless of agent, apply self-improvement when you:
- Discover something non-obvious - solution wasn't immediate
- Correct yourself - initial approach was wrong
- Learn project conventions - discovered undocumented patterns
- Hit unexpected errors - especially if diagnosis was difficult
- Find better approaches - improved on your original solution
Copilot Chat Integration
For Copilot users, add this to your prompts when relevant:
After completing this task, evaluate if any learnings should be logged to
.learnings/using the self-improvement skill format.
Or use quick prompts:
- "Log this to learnings"
- "Create a skill from this solution"
- "Check .learnings/ for related issues"
- "/instinct-status" (v2)
- "/evolve" (v2)
Privacy
- Observations stay local on your machine
- Project-scoped instincts are isolated per project
- Only instincts (patterns) can be exported — not raw observations
- No actual code or conversation content is shared
- You control what gets exported and promoted
References
| Resource | Description |
|---|---|
| everything-claude-code | ECC project that inspired v2 instinct-based architecture |
| Homunculus | Community project that influenced v2 design |
| OpenClaw | Workspace-based multi-agent platform |
| Agent Skills Spec | https://agentskills.io/specification |
Instinct-based learning: teaching Claude your patterns, one project at a time.