Context Management in SpecWeave
Overview
SpecWeave achieves efficient context usage through two native Claude Code mechanisms:
-
Progressive Disclosure (Skills) - Claude's built-in skill loading system
-
Sub-Agent Parallelization - Isolated context windows for parallel work
Important: SpecWeave does NOT use custom context manifests or caching systems. It leverages Claude's native capabilities.
- Progressive Disclosure (Skills)
How It Works
Claude Code uses a two-level progressive disclosure system for skills:
Level 1: Metadata Only (Always Loaded)
name: nextjs description: NextJS 14+ implementation specialist. Creates App Router projects...
What Claude sees initially:
-
Only the YAML frontmatter (name + description)
-
~50-100 tokens per skill
-
All skills' metadata is visible
-
Claude can decide which skills are relevant
Level 2: Full Skill Content (Loaded On-Demand)
NextJS Skill
[Full documentation, examples, best practices...] [Could be 5,000+ tokens]
What Claude loads:
-
Full SKILL.md content only if skill is relevant to current task
-
Prevents loading 35+ skills (175,000+ tokens) when you only need 2-3
-
This is the actual mechanism that saves tokens
Example Workflow
User: "Create a Next.js authentication page" ↓ Claude reviews skill metadata (35 skills × 75 tokens = 2,625 tokens) ↓ Claude determines relevant skills:
- nextjs (matches "Next.js")
- frontend (matches "page")
- (NOT loading: python-backend, devops, hetzner-provisioner, etc.) ↓ Claude loads ONLY relevant skills:
- nextjs: 5,234 tokens
- frontend: 3,891 tokens ↓ Total loaded: 9,125 tokens (vs 175,000+ if loading all skills) Token reduction: ~95%
References
-
What are Skills?
-
Agent Skills Engineering
"Skills work through progressive disclosure—Claude determines which Skills are relevant and loads the information it needs to complete that task, helping to prevent context window overload."
- Sub-Agent Parallelization
How It Works
Sub-agents in Claude Code have isolated context windows:
Main conversation (100K tokens used) ↓ Launches 3 sub-agents in parallel ↓ ├─ Sub-agent 1: Fresh context (0K tokens used) ├─ Sub-agent 2: Fresh context (0K tokens used) └─ Sub-agent 3: Fresh context (0K tokens used)
Benefits:
Context Isolation
-
Each sub-agent starts with empty context
-
Doesn't inherit main conversation's 100K tokens
-
Can load its own relevant skills
Parallelization
-
Multiple agents work simultaneously
-
Each with own context budget
-
Results merged back to main conversation
Token Multiplication
-
Main: 200K token limit
-
Sub-agent 1: 200K token limit
-
Sub-agent 2: 200K token limit
-
Effective capacity: 600K+ tokens across parallel work
Example Workflow
User: "Build a full-stack Next.js app with auth, payments, and admin" ↓ Main conversation launches 3 sub-agents in parallel: ↓ ├─ Sub-agent 1 (Frontend) │ - Loads: nextjs, frontend skills │ - Context: 12K tokens │ - Implements: Auth UI, payment forms │ ├─ Sub-agent 2 (Backend) │ - Loads: nodejs-backend, security skills │ - Context: 15K tokens │ - Implements: API routes, auth logic │ └─ Sub-agent 3 (DevOps)
- Loads: devops, hetzner-provisioner skills
- Context: 8K tokens
- Implements: Deployment configs ↓ All 3 work in parallel with isolated contexts ↓ Results merged back to main conversation ↓ Total effective context: 35K tokens across 3 agents (vs 175K+ if loaded all skills in main conversation)
References
- Sub-Agents Documentation
Actual Token Savings
Progressive Disclosure Savings
Scenario: User asks about Next.js
Without progressive disclosure:
Load all 35 skills: ~175,000 tokens Context bloat: Massive
With progressive disclosure:
Metadata (all skills): ~2,625 tokens Load relevant (2 skills): ~9,000 tokens Total: ~11,625 tokens Reduction: ~93%
Sub-Agent Savings
Scenario: Complex multi-domain task
Single agent approach:
Load all relevant skills: ~50,000 tokens Main conversation history: ~80,000 tokens Total context used: ~130,000 tokens Risk: Approaching context limit
Sub-agent approach:
Main conversation: ~5,000 tokens (coordination only) Sub-agent 1: ~15,000 tokens (isolated) Sub-agent 2: ~18,000 tokens (isolated) Sub-agent 3: ~12,000 tokens (isolated) Total: ~50,000 tokens across 4 contexts Reduction: ~62% (130K → 50K)
Note: Exact percentages vary by task complexity. These are approximate based on typical usage patterns.
How SpecWeave Leverages These Mechanisms
- Skill Organization (Progressive Disclosure)
SpecWeave organizes 35+ skills with clear, focused descriptions:
Good: Focused description
name: nextjs description: NextJS 14+ App Router specialist. Server Components, SSR, routing.
Bad: Vague description
name: frontend description: Does frontend stuff
Why this matters:
-
Clear descriptions help Claude identify relevance quickly
-
Prevents loading irrelevant skills
-
Maximizes progressive disclosure benefits
- Agent Coordination (Sub-Agent Parallelization)
SpecWeave's role-orchestrator skill automatically:
-
Detects multi-domain tasks
-
Launches specialized sub-agents (PM, Architect, DevOps, etc.)
-
Each sub-agent loads only its relevant skills
-
Coordinates results back to main conversation
Example:
User: "/sw:inc 'Full-stack SaaS with Stripe payments'" ↓ role-orchestrator activates ↓ Launches sub-agents in parallel: ├─ PM agent (requirements) ├─ Architect agent (system design) ├─ Security agent (threat model) └─ DevOps agent (deployment) ↓ Each loads only relevant skills in isolated context ↓ Results merged into increment spec
Common Misconceptions
❌ Myth 1: "SpecWeave has custom context manifests"
Reality: No. SpecWeave uses Claude's native progressive disclosure. Skills load based on Claude's relevance detection, not custom YAML manifests.
❌ Myth 2: "SpecWeave caches loaded context"
Reality: No custom caching. Claude Code handles caching internally (if applicable). SpecWeave doesn't implement additional caching layers.
❌ Myth 3: "70-90% token reduction"
Reality: Token savings vary by task:
-
Simple tasks: 90%+ (load 1-2 skills vs all 35)
-
Complex tasks: 50-70% (load 5-10 skills + use sub-agents)
-
Exact percentages depend on task complexity
✅ Truth: "It just works"
Reality: Progressive disclosure and sub-agents are automatic. You don't configure them. Claude handles skill loading, sub-agent context isolation happens automatically when agents are launched.
Best Practices
For Skill Descriptions
Do:
-
Be specific about what the skill does
-
Include trigger keywords users might say
-
List technologies/frameworks explicitly
Don't:
-
Write vague descriptions ("helps with coding")
-
Omit key activation triggers
-
Mix multiple unrelated domains in one skill
For Sub-Agent Usage
When to use sub-agents:
-
Multi-domain tasks (frontend + backend + devops)
-
Parallel work (multiple features simultaneously)
-
Large codebase exploration (different modules)
When NOT to use sub-agents:
-
Simple single-domain tasks
-
Sequential work requiring shared context
-
When main conversation context is already low
Debugging Context Usage
Check Active Skills
When Claude mentions using a skill:
User: "Create a Next.js page" Claude: "🎨 Using nextjs skill..."
This means:
-
Progressive disclosure worked
-
Only nextjs skill loaded (not all 35)
-
Context efficient
Check Sub-Agent Usage
When Claude mentions launching agents:
Claude: "🤖 Launching 3 specialized agents in parallel..."
This means:
-
Sub-agent parallelization active
-
Each agent has isolated context
-
Efficient multi-domain processing
Summary
SpecWeave achieves context efficiency through:
Progressive Disclosure (Native Claude)
-
Skills load only when relevant
-
Metadata-first approach
-
90%+ savings on simple tasks
Sub-Agent Parallelization (Native Claude Code)
-
Isolated context windows
-
Parallel processing
-
50-70% savings on complex tasks
No custom manifests. No custom caching. Just smart use of Claude's native capabilities.
References
-
Claude Skills Documentation
-
Agent Skills Engineering Blog
-
Sub-Agents Documentation