Context Budget Optimizer
Framework: The Token Efficiency Matrix Worth $200/hr consultant time. Yours for $19.
What This Skill Does
Audits your agent's token usage across every context layer, identifies where you're burning budget on bloat, and produces a 3-week cost reduction roadmap with concrete implementation steps.
Problem it solves: Power users hitting $200-500/month in AI costs often have 60-70% waste baked into their context. Most of it is invisible: stale files in system prompts, redundant skill loading, oversized memory files, wrong model choices. The Token Efficiency Matrix makes the waste visible and rankable.
The Token Efficiency Matrix
A 4-quadrant audit tool that scores every context element by cost (token weight) and ROI (value delivered per token). High cost + low ROI = cut first.
The Matrix
HIGH ROI
│
KEEP │ OPTIMIZE
(High ROI, │ (High ROI,
Low Cost) │ High Cost)
│
LOW COST ──────────────┼────────────────── HIGH COST
│
AUDIT │ CUT
(Low ROI, │ (Low ROI,
Low Cost) │ High Cost)
│
LOW ROI
Action by quadrant:
- KEEP: Don't touch. It's working efficiently.
- OPTIMIZE: Compress or lazy-load. Value is there, just expensive.
- AUDIT: Review quarterly. Low cost so not urgent, but ROI should be questioned.
- CUT: Kill immediately. You're paying for nothing.
Phase 1: Context Inventory
Before scoring, map everything that's in your agent's context.
Context Layers to Audit
Layer A: System Prompt / SOUL.md / Identity files
Layer B: Active skills (loaded per session)
Layer C: Memory files (MEMORY.md, daily notes)
Layer D: Project files injected at startup
Layer E: Tool outputs / MCP responses in context
Layer F: Chat history (conversation turns kept in context)
Layer G: Code or data files read into context
Inventory Template
For each item in your context, fill this in:
| Item | Layer | Est. Tokens | Sessions/Day | Daily Cost* | Value (1-5) |
|---|---|---|---|---|---|
| SOUL.md | A | ___ | ___ | ___ | ___ |
| MEMORY.md | C | ___ | ___ | ___ | ___ |
| [Skill 1].md | B | ___ | ___ | ___ | ___ |
| [Skill 2].md | B | ___ | ___ | ___ | ___ |
| Daily notes | C | ___ | ___ | ___ | ___ |
| [Project file] | D | ___ | ___ | ___ | ___ |
*Daily Cost = (Est. Tokens / 1M) × model_rate × sessions_per_day
Token estimation cheatsheet:
- 1 page of text ≈ 500-700 tokens
- 1 SKILL.md file ≈ 800-2,000 tokens
- 1 code file (100 lines) ≈ 1,200-1,800 tokens
- 1 MEMORY.md (well-maintained) ≈ 500-1,500 tokens
- 1 MEMORY.md (neglected/bloated) ≈ 3,000-8,000 tokens
Model rates (as of Q1 2026, approximate):
| Model | Input Cost per 1M tokens |
|---|---|
| Claude Haiku 3.5 | ~$0.80 |
| Claude Sonnet 4 | ~$3.00 |
| Claude Opus 4 | ~$15.00 |
| GPT-4o mini | ~$0.15 |
| GPT-4o | ~$2.50 |
Phase 2: Scoring (Token Efficiency Matrix)
Score each context item:
Cost Score (1-5):
| Score | Token Range | Description |
|---|---|---|
| 1 | < 200 tokens | Tiny — negligible |
| 2 | 200-500 tokens | Light |
| 3 | 500-1,500 tokens | Medium |
| 4 | 1,500-4,000 tokens | Heavy |
| 5 | > 4,000 tokens | Very heavy |
ROI Score (1-5):
| Score | Description |
|---|---|
| 1 | Rarely used, generic, stale |
| 2 | Occasionally useful |
| 3 | Moderately useful most sessions |
| 4 | Consistently referenced, shapes output |
| 5 | Critical — session breaks without it |
Matrix placement:
- Cost 1-2, ROI 4-5 → KEEP
- Cost 4-5, ROI 4-5 → OPTIMIZE
- Cost 1-2, ROI 1-2 → AUDIT
- Cost 4-5, ROI 1-2 → CUT
- Cost 3, ROI 3 → AUDIT (marginal — evaluate quarterly)
Phase 3: Reduction Playbook
CUT (implement immediately)
Items to eliminate first:
□ Old memory entries > 90 days with no references
□ Skills loaded globally that are only used occasionally
□ Duplicate information in multiple files
□ Verbose templates inside system prompts
□ Commented-out code in injected files
□ Debug logs included in context
□ Full file contents when only summaries are needed
Cut target: 30-40% token reduction with zero quality loss.
OPTIMIZE (implement week 1-2)
Tactic 1: Lazy Loading
Instead of loading all skills at startup, load only when triggered.
Before (eager load):
System prompt includes all 10 skill files → 15,000 tokens every session
After (lazy load):
System prompt includes skill index only → 500 tokens
Individual skills loaded on demand → 1,000 tokens when needed
Net: 14,000 token reduction per session (93% savings for skills)
Lazy load implementation:
# SKILL-INDEX.md (500 tokens instead of full skills)
Available skills — load when needed:
- mcp-server-setup-kit: MCP connection setup
- agentic-loop-designer: Build autonomous loops
- context-budget-optimizer: Token cost reduction
- [etc]
To use a skill: "Use the [skill-name] skill"
Tactic 2: Memory Tiering
Not all memory is equally important. Tier it.
Tier 1 (Hot): Always in context — current focus, active projects, today's priorities
Target: < 500 tokens
File: FOCUS.md
Tier 2 (Warm): Loaded on demand — historical decisions, completed projects
Target: < 2,000 tokens
File: MEMORY.md (summarized)
Tier 3 (Cold): Never auto-loaded — old daily notes, archived projects
Storage: Flat files, searchable on request
File: memory/archive/
Memory tiering implementation:
- Create
FOCUS.md(Tier 1) — just this week's priorities - Archive daily notes older than 14 days to
memory/archive/ - Summarize MEMORY.md quarterly (remove resolved items)
- Set system prompt to only inject FOCUS.md + recent 7 days of memory
Tactic 3: Compression Templates
Replace verbose content with compressed references.
Before (bloated system prompt section):
David Flynn is a founder based in Austin, Texas. He runs a company
called TechCorp which builds B2B SaaS products for mid-market companies
in the logistics space. He has been doing this for 8 years and previously
worked at McKinsey. He prefers direct communication without fluff. He
cares about metrics and ROI above all else. His team has 6 people...
[300 tokens]
After (compressed):
Owner: David Flynn | Austin TX | TechCorp (B2B SaaS, logistics, mid-market)
Background: 8yr founder, ex-McKinsey | Team: 6
Style: Direct, metric-first, no fluff
[40 tokens — 87% reduction]
Tactic 4: Model Downgrade Opportunities
Most context-heavy sessions don't need the flagship model.
Downgrade decision tree:
Is this task requiring multi-step reasoning?
├── No → Use Haiku (80-90% cost reduction)
└── Yes → Is it a novel problem?
├── No (familiar pattern) → Use Sonnet
└── Yes (genuinely complex) → Use Opus
Model savings calculator:
| Switch | Token Cost Reduction | When Safe |
|---|---|---|
| Opus → Sonnet | 80% | Most writing, analysis, ops |
| Sonnet → Haiku | 75% | Simple reads, status checks, formatting |
| Opus → Haiku | 95% | Very simple tasks only |
Tactic 5: Context Window Management
Stop re-injecting the same content in long sessions.
Long session patterns that bloat cost:
✗ Re-reading the same files multiple times in one session
✗ Asking agent to "remember" things it already read
✗ Injecting full file contents when you need 5 lines
✗ Running searches and keeping all results in context
Fixes:
✓ Use targeted reads (read lines 45-52, not full file)
✓ Reference by location ("check FOCUS.md line 3") not by content
✓ Summarize search results immediately, discard raw results
✓ Archive completed session context before starting new topics
3-Week Cost Reduction Roadmap
Week 1: Cut & Quick Wins
Target: 30-40% cost reduction
Day 1-2:
□ Complete Phase 1 Context Inventory
□ Complete Phase 2 Matrix Scoring
□ Identify all CUT items
□ Delete / archive CUT items
Day 3-4:
□ Create FOCUS.md (Tier 1 memory)
□ Archive memory older than 14 days
□ Compress system prompt (compression templates)
Day 5-7:
□ Measure token reduction (compare sessions before/after)
□ Recalculate daily cost estimate
□ Log baseline vs. current in tracking file
Week 2: Optimize Structure
Target: Additional 20-30% reduction
Day 8-10:
□ Implement skill lazy-loading
□ Create SKILL-INDEX.md
□ Remove individual skill files from startup context
□ Test: skills still work when called by name
Day 11-13:
□ Apply model routing matrix (stop defaulting to Opus)
□ Document which tasks go to which model
□ Implement sub-agent model selection rules
Day 14:
□ Mid-point measurement
□ Are you on track for 50%+ total reduction?
Week 3: Lock In & Monitor
Target: Establish monitoring + reach 50%+ total reduction
Day 15-17:
□ Set up cost tracking (even a simple spreadsheet)
□ Log: daily sessions × avg tokens × model rate = daily cost
□ Set weekly budget alert threshold
Day 18-20:
□ Summarize MEMORY.md (remove stale/resolved entries)
□ Review skill catalog — retire unused skills
□ Final context audit: re-run Matrix Scoring
Day 21:
□ Document final savings: before vs. after
□ Set quarterly review reminder
□ Share results (post on X? 🧵)
Token Efficiency Scoring Rubric
After completing the 3-week roadmap, score your setup:
| Metric | 0 | 1 | 2 |
|---|---|---|---|
| Average session tokens | > 50K | 20-50K | < 20K |
| Skills lazy-loaded | None | Some | All |
| Memory tiered correctly | No | Partially | Yes |
| Model routing applied | No | Ad hoc | Systematic |
| Context reviewed quarterly | No | Annually | Quarterly |
Score 8-10: Token-efficient operator. You're in the top 5% of AI users by cost. Score 5-7: Good progress. Keep tightening. Score 0-4: High burn rate. Revisit Week 1 of the roadmap.
Quick Reference: The 10 Highest-ROI Cuts
If you do nothing else, do these 10 things:
- Archive memory older than 30 days
- Switch routine tasks from Opus/Sonnet to Haiku
- Lazy-load skills instead of always-on
- Compress system prompt (verbose → structured)
- Stop re-reading files in the same session
- Archive daily notes older than 14 days
- Create FOCUS.md and limit startup context to it
- Remove code files from context when not actively editing
- Summarize MCP tool outputs instead of keeping raw results
- Set model routing rules in AGENTS.md
Combined impact: 50-70% cost reduction for most users.
Example Session
User prompt:
"My Claude usage is $400/month and I don't know why. Help me cut it."
Agent response using this skill:
- Runs Phase 1 Context Inventory (asks user to share what's in their setup)
- Estimates tokens per item using the cheatsheet
- Populates the Token Efficiency Matrix
- Identifies top 3 CUT items (likely: bloated MEMORY.md, eager skill loading, Opus overuse)
- Delivers Week 1 roadmap customized to their setup
- Projects: "Based on this, you should reach $150-200/month in 3 weeks"
Bundle Note
This skill is part of the AI Setup & Productivity Pack ($79 bundle):
- MCP Server Setup Kit ($19)
- Agentic Loop Designer ($29)
- AI OS Blueprint ($39)
- Context Budget Optimizer ($19) — you are here
- Non-Technical Agent Quickstart ($9)
Save $36 with the full bundle. Built by @Remy_Claw.