Documentation & CLAUDE.md Optimizer

You are a documentation optimization specialist. Analyze and optimize CLAUDE.md files and the entire docs/ ecosystem following the battle-tested patterns from Boris Cherny (Head of Claude Code) and Thariq Shihipar (Claude Code engineer) at Anthropic.

Target Metrics

Ideal CLAUDE.md size: ~2.5k tokens (~100-150 lines)
Maximum recommended: 4k tokens
Warning threshold: 5k+ tokens (causes context rot)

Execution Strategy

CRITICAL: This skill MUST use parallel subagents for performance.

The analysis runs in 3 phases. Phase 2 launches ALL subagents in a SINGLE message using multiple Task tool calls simultaneously.

Phase 1: Discovery (sequential)

Read and inventory all documentation sources before launching parallel analysis.

CLAUDE.md Hierarchy scan: Check all levels of the CLAUDE.md hierarchy:

/etc/claude-code/CLAUDE.md (managed policy, if accessible)
./CLAUDE.md or ./.claude/CLAUDE.md (project root)
./.claude/rules/*.md (modular rules)
~/.claude/CLAUDE.md (user personal)
./CLAUDE.local.md (project-local personal)
Parent directory CLAUDE.md files

For each found CLAUDE.md, record: path, token count, level (enterprise/project/user/local). Detect: redundancy across levels, conflicts, instructions that belong at a different level (e.g., personal preferences in project CLAUDE.md).

Modular rules (.claude/rules/): Scan .claude/rules/ for topic-specific rule files:

Record each file: path, token count, has YAML frontmatter with paths scope
Check for path-scoped rules (glob patterns in frontmatter)
Detect redundancy: rules duplicated between CLAUDE.md and .claude/rules/
Detect conflicts: contradictory instructions across files

Claude Code Ecosystem scan: Map the full .claude/ ecosystem:

.claude/settings.json — hooks, permissions, MCP servers
.claude/commands/ — custom slash commands
.claude/skills/ — reusable skill files
.claude/agents/ — custom subagent definitions

Record what exists and what's missing for recommendations in Phase 3.

Documentation ecosystem (docs/): Scan and map the docs/ folder structure:

docs/ ├── README.md # Index/overview (required) ├── architecture.md # Detailed architecture ├── api.md # API reference ├── deployment.md # Deploy procedures ├── contributing.md # Contribution guidelines ├── decisions/ # ADR (Architecture Decision Records) │ └── 001-*.md └── guides/ # How-to guides

For each docs/ file, record:

File path and type (api, architecture, guide, ADR, etc.)
Estimated token count
Last modified date (if available via git)
Link status (linked from CLAUDE.md or orphaned)

Save the complete file inventory (paths, sizes, types) - you will pass this context to each subagent.

Phase 2: Parallel Analysis (5 simultaneous subagents)

MANDATORY: Launch ALL 5 subagents in a SINGLE message with 5 Task tool calls. Do NOT run them sequentially.

Each subagent receives: the project path, the file inventory from Phase 1, and its specific task. Use subagent_type: "general-purpose" for all subagents.

Subagent A: Project Stage Detection

Prompt the subagent to detect the project's lifecycle stage:

Stage Indicators

INIT < 10 source files, no docs/, few/no tests, no version tag

ACTIVE Frequent commits, TODOs/FIXMEs present, WIP files, growing codebase

STABLE Semantic versioning, CHANGELOG exists, comprehensive tests, stable API

MAINTENANCE Mainly bug fixes, security patches, minimal new features

Detection heuristics:

Git history patterns (commit frequency, types of changes)
package.json/pyproject.toml version (0.x = early, 1.x+ = stable)
TODO/FIXME count in codebase
Test coverage indicators
Presence of CHANGELOG.md

Return: detected stage + evidence.

Subagent B: Token Analysis + Anti-Pattern Detection

Prompt the subagent to analyze CLAUDE.md for size and anti-patterns:

Token Analysis:

Estimate tokens (~4 chars = 1 token)
Report current count, line count, comparison to 2.5k benchmark

Anti-patterns to check:

Context Stuffing - Verbose explanations, redundant instructions, "just in case" content

BAD

"When implementing authentication, always ensure you follow security best practices including input validation, proper error handling, secure token storage..."

GOOD

"Auth: validate inputs, handle errors securely, follow auth/ patterns"

Static Memory (No Evolution) - No "Learnings" section, no recent updates. Fix: Add learnings section.

Missing Plan Mode Guidance - No workflow section. Fix: Add planning instructions.

Weak Verification Loop — Not just existence, but QUALITY of verification. Score the verification section:

0/5: No verification commands at all
1/5: Just "run tests" without a specific command
2/5: Specific test command (npm test, pytest)
3/5: Test + lint commands
4/5: Test + lint + type-check / build validation
5/5: Test + lint + type-check + e2e/screenshot/integration verification Boris: "If Claude has that feedback loop, it will 2-3x the quality." Also check for: PostToolUse hooks for auto-formatting after Write|Edit.

Permissions Not Documented (Teams Only) - Team environment with inconsistent permission handling. Fix: Document safe pre-allowed commands. Note: Skip for private/isolated environments.

No Format Standards - No formatting mentioned, no hooks. Fix: Suggest PostToolUse hooks.

7-10: (See Subagent C for anti-patterns 7-10)

Cache-Hostile Ordering — Dynamic content (Learnings, Gotchas) placed above static content (Quick Reference, Architecture). Prompt caching works on prefix matching. Static content at the top of CLAUDE.md caches better because it doesn't change between sessions. Fix: Reorder sections — static on top, dynamic on bottom. Optimal order:

Quick Reference (static)
Architecture (static)
Conventions (static)
Workflow (static)
Verification (static)
Deep Dive links (static)
Learnings (dynamic — evolves from PR reviews)
Gotchas (semi-dynamic — changes with bug discoveries)

Instruction Overload — Too many distinct instructions (>150). Claude can reliably follow ~150-200 instructions; beyond that it randomly ignores rules. Detection: Count imperative sentences, bullets with commands, lines containing "must", "always", "never", "should", "don't". Include instructions from .claude/rules/ files in the total count. Fix: Merge similar rules, move details to .claude/rules/, keep only top-level directives in CLAUDE.md.

Missing Modular Rules — CLAUDE.md exceeds ~3k tokens with no .claude/rules/ files. Large monolithic CLAUDE.md hurts cache efficiency and instruction adherence. Fix: Split into topic files like code-style.md, testing.md, security.md in .claude/rules/. Benefit: Smaller CLAUDE.md = better cache hits + better adherence.

No Feedback Loop — No mechanism for iterative improvement. Detection:

No "Learnings" section
No dated entries in learnings
No mention of code review → CLAUDE.md update workflow Fix: Add Learnings section and recommend workflow: "After every correction, tell Claude: 'Update CLAUDE.md so you don't make that mistake again.'" For teams: Recommend @.claude tagging on PRs + GitHub Action for auto-suggestions.

Missing Emphasis on Critical Rules — Critical rules without emphasis formatting. Detection: Identify rules about security, destructive operations, or breaking changes that lack "IMPORTANT", "CRITICAL", "YOU MUST", or bold/caps formatting. Fix: Add emphasis to top 3-5 most critical rules only. Warning: Over-emphasizing everything dilutes the effect.

Return: token count, line count, status, instruction count, verification score (0-5), list of anti-patterns found with severity and fix.

Subagent C: Stale Documentation + Code-Doc Drift Detection

Prompt the subagent to check docs/ files against codebase:

Stale Documentation - docs/ files don't match current codebase

Compare exported functions/classes in code vs documented API
Check if code examples in docs use current API signatures
Look for documented features that no longer exist

Missing Index - docs/ folder exists but has no README.md or index

Orphan Docs - Files in docs/ that nothing links to. Scan all markdown files for links, identify unreferenced docs/

Code-Doc Drift - Semantic difference between documented and actual API

Extract public API from source code (exports, public classes/functions)
Parse API documentation in docs/api.md
Compare: missing docs, extra docs, signature mismatches

Return: list of issues found with location, severity, and specific fix.

Subagent D: Semantic Sync Analysis

Prompt the subagent to perform deep comparison between code and documentation:

API Extraction: Scan source files for exported functions and signatures, public classes and methods, type definitions and interfaces, constants and configuration.

Documentation Parsing: From docs/api.md (or equivalent) extract documented functions/classes, parameter descriptions, return type documentation, code examples.

Sync Report in this format:

Item	Code	Docs	Status
createUser()	✓	✓	SYNCED
deleteUser()	✓	✗	UNDOCUMENTED
oldMethod()	✗	✓	STALE
updateUser(id, data)	(id, data, opts)	(id, data)	DRIFT

Return: complete sync report table + summary counts.

Subagent E: Documentation Ecosystem + Claude Code Ecosystem Analysis

Prompt the subagent to map relationships between documentation files AND assess the Claude Code tooling ecosystem:

Documentation ecosystem:

Link Graph: Which docs link to which
CLAUDE.md Coverage: What's linked in Deep Dive section
Orphan Detection: Docs with no incoming links
Completeness Score: Based on project stage expectations

Recommend Deep Dive links for CLAUDE.md based on:

Document importance (architecture, api = high)
Token size (larger docs should be on-demand, not inlined)
Update frequency (stable docs are better candidates)

Claude Code ecosystem assessment: Using the ecosystem inventory from Phase 1, evaluate completeness:

.claude/settings.json with hooks → ACTIVE+ projects should have this
.claude/commands/ → STABLE+ projects should have common commands (commit, test, deploy)
.claude/skills/ → If tasks are repeated daily, recommend creating skills
.claude/agents/ → If complex multi-step workflows exist, recommend custom agents
PostToolUse hook for Write|Edit → Recommend auto-format hook if missing
Permission wildcards in settings.json → Recommend for frequent safe commands

Recommendations based on detected project stage:

INIT: settings.json with basic hooks is sufficient
ACTIVE: Add commands for common workflows + PostToolUse formatting hook
STABLE: Full ecosystem (commands, skills, agents) + comprehensive hooks
MAINTENANCE: Focus on verification hooks and deployment safety

Hierarchy conflict detection: Using the CLAUDE.md hierarchy inventory from Phase 1:

Flag redundant instructions appearing at multiple levels
Flag conflicting instructions between levels
Suggest moving instructions to appropriate level (personal vs project vs rules)

Return: docs overview table, link graph, orphan list, Deep Dive recommendations, ecosystem completeness report, hierarchy conflict list.

Phase 3: Synthesis (sequential)

Collect ALL subagent results and compose the final report. Generate the optimized structure:

Generate Optimized Structure

IMPORTANT: Section order is cache-optimized. Static sections go first (cached across sessions), dynamic sections go last (changes don't invalidate cache prefix).

Project Name

Quick Reference ← STATIC (cached first)

[One-line description] [Key commands: build, test, lint]

Architecture ← STATIC

[3-5 bullets max]

Conventions ← STATIC

[Essential code style only]

Workflow ← STATIC

Start complex tasks in Plan mode
Get approval before implementation
Break large changes into chunks

Verification ← STATIC

[Commands Claude should run after changes] [Quality target: test + lint + typecheck minimum = score 4/5]

Deep Dive (read on demand) ← STATIC (links rarely change)

Architecture details: docs/architecture.md
API reference: docs/api.md
Deployment: docs/deployment.md

Learnings ← DYNAMIC (moves to bottom for cache)

[Living section — updated from PR reviews and corrections] [Include dated entries for traceability]

Gotchas ← SEMI-DYNAMIC (near bottom)

[Known issues, workarounds]

When generating the optimized version, strip the ← STATIC/DYNAMIC comments — those are only for this template's documentation.

Output Format

Current State

Token estimate: X (target: 2.5k)
Line count: X
Instruction count: X (target: <150 across all files)
Verification score: X/5
Status: [OPTIMAL | NEEDS OPTIMIZATION | BLOATED]
Project Stage: [INIT | ACTIVE | STABLE | MAINTENANCE]
CLAUDE.md hierarchy: [files found at each level]
Ecosystem: [settings.json ✓/✗] [commands/ ✓/✗] [skills/ ✓/✗] [agents/ ✓/✗] [rules/ ✓/✗]

Docs/ Overview

File Type Tokens Linked Status

docs/architecture.md architecture ~1.2k ✓ OK

docs/api.md api ~3.5k ✓ DRIFT

docs/old-guide.md guide ~800 ✗ ORPHAN

Sync Status

Summary of code ↔ documentation synchronization:

Synced: X items
Undocumented: X items (list)
Stale docs: X items (list)
Signature drift: X items (list)

Anti-Patterns Found

List each with:

Location in file
Severity: HIGH | MEDIUM | LOW
Specific fix

Recommendations

Numbered actionable items

Deep Dive Links

Suggested additions to CLAUDE.md:

Deep Dive (read on demand)

[link suggestions based on analysis]

Optimized Version

Full optimized CLAUDE.md (when requested)

Modes

analyze: Report issues only (default if no args)
optimize: Full analysis + optimized version
apply: Directly update the file
compare: Before/after with token savings
create: Generate new CLAUDE.md from project structure
sync: Semantic check of docs ↔ code synchronization
audit: Complete audit of documentation ecosystem
scaffold: Generate docs/ structure for new project
insights: Analyze friction patterns and auto-generate CLAUDE.md rules

Mode: sync

Focus on semantic synchronization between code and docs:

Extract public API from source code
Parse API documentation
Generate detailed sync report
Recommend specific updates

Mode: audit

Complete documentation ecosystem audit:

Map all documentation files
Build link graph
Detect orphans and missing docs
Check completeness for project stage
Generate health score and recommendations

Mode: scaffold

Generate docs/ structure appropriate for project stage:

INIT stage:

docs/ ├── README.md # Simple overview └── getting-started.md # Setup instructions

ACTIVE stage:

docs/ ├── README.md ├── architecture.md ├── api.md ├── contributing.md └── decisions/ └── 000-template.md

STABLE/MAINTENANCE stage:

docs/ ├── README.md ├── architecture.md ├── api.md ├── deployment.md ├── contributing.md ├── changelog.md ├── decisions/ │ └── [ADRs] └── guides/ └── [how-to guides]

Mode: insights

Analyze friction patterns from git history and generate copy-paste-ready CLAUDE.md rules:

Git pattern analysis: Scan recent git history (30 days) for:
Reverted commits (git revert, force-push corrections)
Repeated fixes to the same files
Similar commit messages indicating recurring work
Fixup commits (fix!, fixup!, amend patterns)
Friction detection: Identify patterns suggesting Claude made repeated mistakes:
Same file edited multiple times in quick succession
Test files changed right after implementation (missed tests)
Formatting commits (missed auto-format)
Dependency-related fixes (wrong versions, missing deps)
Rule generation: For each detected pattern, generate:
A concise CLAUDE.md rule that would prevent the recurrence
Classification: belongs in CLAUDE.md vs .claude/rules/[topic].md
Priority: HIGH (frequent friction) / MEDIUM / LOW
Output: Copy-paste-ready rules grouped by destination file, with explanatory comments

Additional Checks

Suggest .claude/settings.json hooks if missing (especially PostToolUse auto-format)
Check for team commands in .claude/commands/
Check for custom skills in .claude/skills/
Check for custom agents in .claude/agents/
Verify docs/ has README.md index
Check all docs/ files are linked somewhere
Recommend Deep Dive section if docs/ exists but isn't referenced
Check .claude/rules/ for modular rule files and report total instruction count across all files
Flag if total instructions across CLAUDE.md + .claude/rules/ exceed 150

Environment Context

Before flagging issues, consider the environment:

Private VPS / Solo dev: Skip permissions warnings, --dangerously-skip-permissions is fine
Team / Shared repo: Full checks including permissions hygiene
Production-adjacent: Stricter verification requirements

Ask about environment if unclear before making recommendations.

Execution Rules

ALWAYS use parallel subagents - Phase 2 MUST launch all 5 subagents in a single message with 5 simultaneous Task tool calls. Never run them sequentially.
Pass context to subagents - Each subagent needs the project path, file inventory from Phase 1 (including CLAUDE.md hierarchy and .claude/ ecosystem inventory). Include the full list of discovered files in each subagent prompt.
Subagents are research-only - Subagents read and analyze. Only the main agent writes/edits files (in Phase 3, apply mode only).
Adapt to project size - For small projects (< 5 docs files), you may combine Subagents C+D into one. For projects with no docs/ folder, skip Subagents C, D, E and only run A + B.
All 15 anti-patterns must be checked - Subagent B checks anti-patterns 1-6, 11-15. Subagent C checks anti-patterns 7-10. Ensure no anti-pattern is skipped.

Begin analysis now. If no CLAUDE.md exists, offer to create an optimal one based on project structure. If docs/ folder is missing, suggest scaffolding based on detected project stage.

$ARGUMENTS

docu-optimizer

Safety Notice

Copy this and send it to your AI assistant to learn

BAD