Agent Audit
Scan your entire OpenClaw setup and get actionable cost/performance recommendations.
What This Skill Does
- Scans config — reads OpenClaw config to map models to agents/tasks
- Analyzes cron history — checks every cron job's model, token usage, runtime, success rate
- Classifies tasks — determines complexity level of each task
- Calculates costs — per agent, per cron, per task type using provider pricing
- Recommends changes — with confidence levels and risk warnings
- Generates report — markdown report with specific savings estimates
Running the Audit
python3 {baseDir}/scripts/audit.py
Options:
python3 {baseDir}/scripts/audit.py --format markdown # Full report (default)
python3 {baseDir}/scripts/audit.py --format summary # Quick summary only
python3 {baseDir}/scripts/audit.py --dry-run # Show what would be analyzed
python3 {baseDir}/scripts/audit.py --output /path/to/report.md # Save to file
How It Works
Phase 1: Discovery
- Read OpenClaw config (
~/.openclaw/openclaw.jsonor similar) - List all cron jobs and their configurations
- List all agents and their default models
- Detect provider (Anthropic, OpenAI, Google, xAI) from model names
Phase 2: History Analysis
- Pull cron job run history (last 7 days by default)
- Calculate per-job: avg tokens, avg runtime, success rate, model used
- Pull session history where available
- Calculate total token spend by model tier
Phase 3: Task Classification
Classify each task into complexity tiers:
| Tier | Examples | Recommended Models |
|---|---|---|
| Simple | Health checks, status reports, reminders, notifications | Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini) |
| Medium | Content drafts, research, summarization, data analysis | Mid tier (Sonnet, GPT-4o, Pro, Grok) |
| Complex | Coding, architecture, security review, nuanced writing | Top tier (Opus, GPT-4.5, Ultra, Grok-2) |
Classification signals:
- Simple: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
- Medium: Medium output, some reasoning needed, creative but templated, research tasks
- Complex: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models
Phase 4: Recommendations
For each task where the model tier doesn't match complexity:
⚠️ RECOMMENDATION: Downgrade "Knox Bot Health Check" from opus to haiku
Current: anthropic/claude-opus-4 ($15/M input, $75/M output)
Suggested: anthropic/claude-haiku ($0.25/M input, $1.25/M output)
Reason: Simple status check averaging 300 output tokens
Estimated savings: $X.XX/month
Risk: LOW — task is simple pattern matching
Confidence: HIGH
Safety Rules — NEVER Recommend Downgrading:
- Coding/development tasks
- Security reviews or audits
- Tasks that have previously failed on weaker models
- Tasks where the user explicitly chose a higher model
- Complex multi-step reasoning tasks
- Anything the user flagged as critical
Phase 5: Report Generation
Output a clean markdown report with:
- Overview — total agents, crons, monthly spend estimate
- Per-agent breakdown — model, usage, cost
- Per-cron breakdown — model, frequency, avg tokens, cost
- Recommendations — sorted by savings potential
- Total potential savings — monthly estimate
- One-liner config changes — exact model strings to swap
Model Pricing Reference
See references/model-pricing.md for current pricing across all providers. Update this file when prices change.
Task Classification Details
See references/task-classification.md for detailed heuristics on how tasks are classified into complexity tiers.
Important Notes
- This skill is read-only — it never changes your config automatically
- All recommendations include risk levels and confidence scores
- When unsure about a task's complexity, it defaults to keeping the current model
- The audit should be re-run periodically (monthly) as usage patterns change
- Token counts are estimates based on cron history — actual costs depend on your provider's billing