Agent Superpowers
A complete development methodology for AI agents. Turns a general-purpose agent into a disciplined engineer that plans before building, verifies before claiming, and debugs systematically.
Quick Start — Inject Into Your Agent
Add to your AGENTS.md to activate always-on behavioral rules:
## Engineering Discipline (Agent Superpowers)
### Verification Iron Law
- NO completion claims without fresh verification evidence in this message.
- Run the command. Read the output. THEN claim the result.
- "Should work" / "looks correct" / "done" without evidence = unverified.
### Anti-Over-Engineering
- Do what was asked. Nothing more, nothing less.
- Three similar lines > premature abstraction.
- Don't add features, refactor, or "improve" beyond the request.
- A bug fix doesn't need surrounding code cleaned up.
### Reversibility
- Local/reversible (edit files, run tests) → proceed freely.
- Hard-to-reverse/shared/destructive → confirm first.
- Approval is scoped, not global. Once ≠ always.
### Three-Strike Debugging
- If 3 fixes fail on the same problem → STOP.
- Question the architecture. Don't attempt fix #4 without discussing.
### Brainstorming Gate
- Before any feature/component/creative work: explore → ask → propose → approve → THEN build.
- "Simple" projects are where unexamined assumptions waste the most work.
The Pipeline
Every non-trivial coding task follows this flow:
brainstorming → plan → implement → spec-review → quality-review → verify → complete
Each stage has a quality gate. Skipping gates costs more time than following them.
1. Brainstorming Gate
When: Any creative work — features, components, new functionality.
Hard rule: Do NOT start implementation until a design is presented and approved.
Process:
- Explore project context (files, docs, recent commits)
- Ask clarifying questions — ONE at a time, prefer multiple choice
- Propose 2-3 approaches with trade-offs and your recommendation
- Present design, get approval section by section
- THEN transition to planning
Anti-pattern: "This is too simple to need a design." Every project gets a design. The design can be 3 sentences for simple projects, but it must exist and be approved.
2. Writing Plans
When: Multi-step task, before touching code.
Granularity — each step is ONE action (2-5 minutes):
- "Write the failing test" = step
- "Run it to verify it fails" = step
- "Implement minimal code to pass" = step
- "Run tests to verify" = step
- "Commit" = step
Every task must include:
- Exact file paths (create/modify/test)
- Complete code (not "add validation here")
- Exact commands with expected output
- TDD steps where applicable
Plan document saved to: docs/plans/YYYY-MM-DD-<feature>.md
See references/plan-template.md for the full template.
3. Three-Agent Review
When: Executing plans via sub-agents.
For each task in a plan, dispatch THREE sub-agents in sequence:
3a. Implementer
- Gets full task text + context (don't make sub-agent read plan file)
- Can ask questions BEFORE starting — answer them completely
- Implements → tests → commits → self-reviews → reports
See references/implementer-prompt.md for the dispatch template.
3b. Spec Reviewer
- Dispatched AFTER implementer reports done
- Explicitly told: "DO NOT trust the implementer's report"
- Reads actual code, compares to requirements line by line
- Checks for: missing requirements, extra features, misunderstandings
- Output: ✅ compliant or ❌ issues with file:line references
- If issues → implementer fixes → re-review until ✅
See references/spec-reviewer-prompt.md for the dispatch template.
3c. Quality Reviewer
- Dispatched ONLY after spec compliance passes
- Checks: clean, tested, maintainable, follows codebase patterns
- Severity: Critical (must fix) / Important (should fix) / Minor (note)
- If Critical/Important → implementer fixes → re-review
See references/quality-reviewer-prompt.md for the dispatch template.
Cost note: More sub-agents per task, but dramatically higher first-time quality. Catching issues at review is cheaper than debugging in production.
4. Systematic Debugging
When: Any bug, test failure, or unexpected behavior. ESPECIALLY when under time pressure.
Iron Law: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
Phase 1: Root Cause Investigation (MANDATORY FIRST)
- Read error messages carefully — they often contain the solution
- Reproduce consistently — if not reproducible, gather more data
- Check recent changes — git diff, new dependencies, config changes
- For multi-component systems: add diagnostic instrumentation at EACH boundary, run once to see WHERE it breaks
Phase 2: Pattern Analysis
- Find working examples of similar code in the same codebase
- Compare working vs broken — list EVERY difference
- Understand dependencies and assumptions
Phase 3: Hypothesis & Testing
- State hypothesis clearly: "I think X because Y"
- Make the SMALLEST possible change to test it (one variable)
- Did it work? → Phase 4. Didn't work? → new hypothesis.
- DON'T stack fixes. One at a time.
Phase 4: Implementation
- Create failing test case first
- Implement single fix addressing root cause
- Verify: test passes, no regressions
The Three-Strike Rule
If 3 fixes fail → STOP and question the architecture.
- Each fix revealing a new problem in a different place = wrong architecture
- Discuss fundamentals before attempting fix #4
- This is NOT a failed hypothesis — this is a wrong pattern
See references/debugging-guide.md for the full guide with rationalization table.
5. Verification Before Completion
When: About to claim work is complete, before committing or creating PRs.
The Gate Function:
1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
- If NO → state actual status with evidence
- If YES → state claim WITH evidence
5. ONLY THEN: Make the claim
Red flags — STOP if you catch yourself:
- Using "should", "probably", "seems to"
- Expressing satisfaction BEFORE verification ("Great!", "Done!")
- About to commit/push without running tests
- Trusting a sub-agent's success report without independent verification
- Thinking "just this once"
| Claim | Requires | NOT Sufficient |
|---|---|---|
| Tests pass | Test command output: 0 failures | Previous run, "should pass" |
| Build succeeds | Build command: exit 0 | Linter passing |
| Bug fixed | Test original symptom: passes | "Code changed, assumed fixed" |
| Agent completed | VCS diff shows changes | Agent reports "success" |
6. Anti-Over-Engineering Rules
Always active. These prevent the most common source of wasted work:
- Do what was asked. Nothing more, nothing less.
- Three similar lines of code > premature abstraction
- Don't add features beyond what was requested
- Don't refactor code you weren't asked to refactor
- Don't add comments/docstrings to code you didn't change
- Don't create helpers for one-time operations
- Don't design for hypothetical future requirements
- Don't add error handling for scenarios that can't happen
- Prefer editing existing files over creating new ones
- If blocked: consider alternatives or ask. Don't brute-force.
7. Reversibility Classification
Before any action, classify:
| Action Type | Examples | Rule |
|---|---|---|
| Local, reversible | Edit files, run tests, search code | Proceed freely |
| Hard to reverse | Force push, git reset --hard, drop tables | Confirm first |
| External-facing | Push code, create PRs, send messages | Confirm first |
| Destructive | Delete files/branches, rm -rf, overwrite work | Confirm first |
- Approval is scoped, not global — approving
git pushonce ≠ always - "When you encounter an obstacle, do not use destructive actions as a shortcut."
- If you discover unexpected state (unfamiliar files, branches) → investigate before deleting
Rationalization Prevention
Common excuses and their realities:
| Excuse | Reality |
|---|---|
| "Too simple to need a plan" | Simple tasks are where assumptions waste the most work |
| "I'll test after implementing" | Test-after proves what code does, not what it should do |
| "Should work now" | Run the verification command |
| "Just one quick fix" | Follow Phase 1 first |
| "I'm confident" | Confidence ≠ evidence |
| "This is different because..." | The rules apply especially when you think they don't |
| "I already know the answer" | Read the file first anyway |
| "One more fix attempt" (after 2 fails) | Third failure = question architecture |
OpenClaw Integration
Spawning Sub-Agents
Use sessions_spawn with the prompt templates in references/. Set model per role:
- Implementer → sonnet (coding strength)
- Spec reviewer → haiku (fast, focused comparison)
- Quality reviewer → gpt (second opinion from different model family)
In Cron Jobs
Sub-agents spawned by crons should follow the same verification gate — don't log "completed successfully" without evidence.
In AGENTS.md
Copy the "Quick Start" section above into your workspace AGENTS.md for always-on rules.
Learning From Other Agents
The patterns in this skill were distilled from studying how top coding agents work. If you use other AI coding tools, you can learn from their approaches too.
Discovering Claude Code's Methodology
If you have Claude Code installed, its official plugin ecosystem includes a set of engineering skills published under the MIT License by Jesse Vincent. These skills cover planning, debugging, verification, code review, TDD, and sub-agent coordination — and they're already on your machine.
How to find them:
- Claude Code stores downloaded plugins in a local cache directory under your user profile
- Look for a
plugins/cachefolder inside your Claude Code configuration directory - The official plugins contain
SKILL.mdfiles alongside reference documents, prompt templates, and agent definitions - These are plain Markdown files — readable with any text editor or
cat
What to look for:
- Skills covering brainstorming, planning, debugging, verification, and code review workflows
- Sub-agent prompt templates that define how implementer, reviewer, and spec-compliance agents operate
- Rationalization prevention tables that catalog excuses an AI uses to skip best practices
The system prompt: Claude Code's runtime is a compiled binary. If you're curious about its behavioral instructions, standard binary analysis techniques (like extracting readable strings) can reveal the embedded system prompt and tool definitions. As a licensed user of the software, inspecting your own binary is your right.
Important: The plugins are MIT licensed (open source). The core binary and its embedded system prompt are proprietary to Anthropic. Study and learn from both — but only redistribute what's MIT licensed.
Cross-Pollination Philosophy
The best AI agents will be built by teams that study ALL agent architectures, not just their own. Read other agents' skills. Understand their patterns. Adapt what works. Attribute what you borrow. This skill exists because we did exactly that.
Pairs Well With
- smart-model-router — auto-select the right model for each sub-agent role (implementer → Sonnet, reviewer → Haiku, quality → GPT)
- subagent-overseer — monitor your three-agent review pipeline without burning tokens on polling
- coding-agent — the dispatch layer that launches your implementer sub-agents
Research papers included, because we're that kind of obsessive.
👉 https://github.com/globalcaos/tinkerclaw
Clone it. Fork it. Break it. Make it yours.
Original methodology by Oscar Serra and Jarvis. Engineering patterns inspired by industry best practices and open-source agent skills (MIT License, Jesse Vincent). Adapted for OpenClaw's multi-agent architecture.