context-optimization

Context Optimization Techniques

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "context-optimization" with this command: npx skills add eyadsibai/ltk/eyadsibai-ltk-context-optimization

Context Optimization Techniques

Extend effective context capacity through compression, masking, caching, and partitioning. Effective optimization can 2-3x effective context capacity without larger models.

Optimization Strategies

Strategy Token Reduction Use Case

Compaction 50-70% Message history dominates

Observation Masking 60-80% Tool outputs dominate

KV-Cache Optimization 70%+ cache hits Stable workloads

Context Partitioning Variable Complex multi-task

Compaction

Summarize context when approaching limits:

if context_tokens / context_limit > 0.8: context = compact_context(context)

Priority for compression:

  • Tool outputs → replace with summaries

  • Old turns → summarize early conversation

  • Retrieved docs → summarize if recent versions exist

  • Never compress system prompt

Summary generation by type:

  • Tool outputs: Preserve findings, metrics, conclusions

  • Conversational: Preserve decisions, commitments, context shifts

  • Documents: Preserve key facts, remove supporting evidence

Observation Masking

Tool outputs can be 80%+ of tokens. Replace verbose outputs with references:

if len(observation) > max_length: ref_id = store_observation(observation) return f"[Obs:{ref_id} elided. Key: {extract_key(observation)}]"

Masking rules:

  • Never mask: Current task critical, most recent turn, active reasoning

  • Consider: 3+ turns old, key points extractable, purpose served

  • Always mask: Repeated outputs, boilerplate, already summarized

KV-Cache Optimization

Cache Key/Value tensors for requests with identical prefixes:

Cache-friendly ordering: stable content first

context = [ system_prompt, # Cacheable tool_definitions, # Cacheable reused_templates, # Reusable unique_content # Unique ]

Design for cache stability:

  • Avoid dynamic content (timestamps)

  • Use consistent formatting

  • Keep structure stable across sessions

Context Partitioning

Split work across sub-agents with isolated contexts:

Each sub-agent has clean, focused context

results = await gather( research_agent.search("topic A"), research_agent.search("topic B"), research_agent.search("topic C") )

Coordinator synthesizes without carrying full context

synthesized = await coordinator.synthesize(results)

Budget Management

context_budget = { "system_prompt": 2000, "tool_definitions": 3000, "retrieved_docs": 10000, "message_history": 15000, "reserved_buffer": 2000 }

Monitor and trigger optimization at 70-80%

When to Optimize

Signal Action

Utilization >70% Start monitoring

Utilization >80% Apply compaction

Quality degradation Investigate cause

Tool outputs dominate Observation masking

Docs dominate Summarization/partitioning

Performance Targets

  • Compaction: 50-70% reduction, <5% quality loss

  • Masking: 60-80% reduction in masked observations

  • Cache: 70%+ hit rate for stable workloads

Best Practices

  • Measure before optimizing

  • Apply compaction before masking

  • Design for cache stability

  • Partition before context becomes problematic

  • Monitor effectiveness over time

  • Balance token savings vs quality

  • Test at production scale

  • Implement graceful degradation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

document-processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

stripe-payments

No summary provided by upstream source.

Repository SourceNeeds Review
General

file-organization

No summary provided by upstream source.

Repository SourceNeeds Review