context-engine

Context Engine - AI Agent Context Management

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "context-engine" with this command: npx skills add borghei/claude-skills/borghei-claude-skills-context-engine

Context Engine - AI Agent Context Management

Tier: POWERFUL Category: Engineering Tags: context management, AI agents, memory systems, RAG, token optimization, knowledge graphs

Overview

Context Engine provides production-grade patterns for managing what AI agents know, remember, and retrieve. It covers the full lifecycle: ingestion of project knowledge, optimal packing of context windows, persistent memory across sessions, and retrieval-augmented generation for large codebases. The difference between a useful agent and a hallucinating one is context management.

Core Capabilities

  1. Context Window Architecture

Every AI agent operates within a finite context window. Mismanaging it is the #1 cause of degraded agent performance.

Token Budget Allocation Framework

Segment Budget % Purpose Priority

System Instructions 5-10% Agent identity, rules, constraints Fixed (always loaded)

Task Context 20-30% Current task description, requirements High (per-request)

Relevant Code 25-40% Source files, dependencies, types Dynamic (retrieved)

Conversation History 10-20% Prior turns, decisions made Sliding window

Tool Results 5-15% Command output, search results Ephemeral

Reserved Buffer 5-10% Output generation headroom Protected

Context Packing Strategies

Greedy Relevance Packing

  1. Score all candidate context by relevance to current task
  2. Sort by score descending
  3. Pack until budget exhausted
  4. Always reserve output buffer
  • Pros: Simple, fast, works well for focused tasks

  • Cons: Misses cross-cutting context, no diversity

Tiered Loading

Tier 0 (always loaded): System prompt, project rules, active file Tier 1 (task-specific): Related files, type definitions, tests Tier 2 (on-demand): Documentation, examples, history Tier 3 (retrieved): Search results, RAG chunks

  • Pros: Predictable, debuggable, respects fixed costs

  • Cons: Requires upfront tier classification

Adaptive Compression

  1. Load full context for first pass
  2. Identify low-signal sections (boilerplate, repetitive code)
  3. Summarize or truncate low-signal sections
  4. Re-pack with compressed context
  5. Preserve high-signal sections verbatim
  • Pros: Maximizes information density

  • Cons: Risk of losing important details in compression

  1. Memory Architecture

Three-Layer Memory Model

┌─────────────────────────────────────────────────┐ │ Layer 1: Working Memory (Context Window) │ │ Scope: Current conversation/task │ │ Lifetime: Single session │ │ Storage: In-context tokens │ │ Update: Every turn │ ├─────────────────────────────────────────────────┤ │ Layer 2: Session Memory (Persistent Store) │ │ Scope: Project-level learnings │ │ Lifetime: Across sessions │ │ Storage: MEMORY.md, .claude/rules/, CLAUDE.md │ │ Update: End of session or on discovery │ ├─────────────────────────────────────────────────┤ │ Layer 3: Knowledge Base (Indexed Corpus) │ │ Scope: Full codebase + documentation │ │ Lifetime: Persistent, versioned │ │ Storage: Vector store, graph DB, file index │ │ Update: On commit / scheduled reindex │ └─────────────────────────────────────────────────┘

Memory Promotion Protocol

Knowledge flows upward through layers based on recurrence and value:

Signal Action Example

Pattern seen 1x Working memory only "This file uses tabs"

Pattern seen 2-3x Candidate for session memory "Project uses pnpm everywhere"

Pattern confirmed across sessions Promote to CLAUDE.md/rules "Always use pnpm, never npm"

Pattern is domain knowledge Add to knowledge base "Auth flow uses JWT + refresh tokens"

Staleness Detection

Context has a shelf life. Stale context causes hallucinations.

Freshness Score = f(last_verified, change_frequency, confidence)

Fresh (< 7 days, file unchanged): Use directly Aging (7-30 days, file changed): Re-verify before using Stale (> 30 days): Flag, re-retrieve, or discard Unknown (never verified): Treat as low-confidence

  1. Retrieval Strategies for Code

File-Level Retrieval

Best for: navigating to the right file when the agent knows what it needs.

Query: "authentication middleware" Strategy:

  1. Filename pattern match: auth, middleware
  2. Import graph: files that import auth modules
  3. Symbol search: exported functions matching auth*
  4. Content search: files containing auth-related patterns
  5. Rank by: recency of edit + import centrality + name match

Chunk-Level Retrieval (RAG for Code)

Best for: finding specific implementations within large files.

Chunking Strategy for Source Code:

  • Chunk by function/class boundaries (never mid-function)

  • Include the function signature + docstring + body as one chunk

  • Attach metadata: file path, language, exports, imports

  • Overlap: include 2 lines above/below for context

  • Max chunk size: 200 lines (larger functions get sub-chunked by logical block)

Embedding Considerations:

  • Code-specific embeddings (CodeBERT, StarCoder embeddings) outperform general text embeddings by 15-30% on code retrieval tasks

  • Hybrid search (keyword + semantic) outperforms either alone

  • Index function signatures separately for fast symbol lookup

Dependency-Aware Retrieval

When retrieving a function, also retrieve:

  • Its type definitions (interfaces, types it uses)

  • Its direct dependencies (imported functions it calls)

  • Its tests (to understand expected behavior)

  • Its callers (to understand usage context)

This "context neighborhood" approach prevents the agent from seeing a function in isolation.

  1. Knowledge Graph Construction

Codebase Graph Schema

Nodes:

  • File (path, language, size, last_modified)
  • Function (name, signature, docstring, complexity)
  • Class (name, methods, properties, inheritance)
  • Module (name, exports, dependencies)
  • Test (name, covers, assertions)
  • Config (type, values, affects)

Edges:

  • IMPORTS (File → File)
  • CALLS (Function → Function)
  • IMPLEMENTS (Class → Interface)
  • TESTS (Test → Function)
  • CONFIGURES (Config → Module)
  • DEPENDS_ON (Module → Module)

Graph Queries for Context

Agent Question Graph Query Context Retrieved

"How does auth work?" Subgraph around auth module, 2 hops Auth files + dependencies + tests

"What breaks if I change X?" Reverse dependency traversal from X All callers + their tests

"What's the API surface?" All exported functions from API modules Route handlers + types + middleware

"How is this tested?" TEST edges from target function Test files + fixtures + mocks

  1. Context Window Optimization Patterns

Pattern: Sliding Window with Anchors

For long conversations, maintain fixed "anchor" messages while sliding recent history.

[System Prompt] ← Fixed anchor (never evicted) [Task Definition] ← Fixed anchor [Key Decision #1] ← Pinned (user marked as important) [Key Decision #2] ← Pinned ... [Turn N-4] ← Sliding window starts here [Turn N-3] [Turn N-2] [Turn N-1] [Current Turn] [Output Buffer] ← Reserved

Pattern: Progressive Summarization

When conversation exceeds budget:

  • Summarize oldest turns into a "conversation summary" block

  • Keep the summary as a single anchor message

  • Update summary every N turns

  • Always keep: first system message, task definition, last 5 turns

Pattern: Selective Tool Result Caching

Tool outputs (file reads, search results, command output) consume the most tokens.

Strategy:

  • Cache tool results keyed by (tool, args, file_hash)
  • On re-request: serve from cache (0 new tokens)
  • On file change: invalidate cache for that file
  • Always truncate: command output > 200 lines → first 50 + last 50
  • Never cache: error output (always show in full)
  1. Multi-Agent Context Sharing

When multiple agents collaborate, context synchronization becomes critical.

Shared Context Bus

┌──────────┐ ┌──────────────────┐ ┌──────────┐ │ Agent A │────▶│ Shared Context │◀────│ Agent B │ │ (Planner) │ │ - Task state │ │ (Coder) │ └──────────┘ │ - Decisions log │ └──────────┘ │ - File changes │ ┌──────────┐ │ - Constraints │ ┌──────────┐ │ Agent C │────▶│ - Artifacts │◀────│ Agent D │ │ (Reviewer)│ └──────────────────┘ │ (Tester) │ └──────────┘ └──────────┘

Context Handoff Protocol

When Agent A passes work to Agent B:

  • State Summary: What was done, decisions made, current state

  • Relevant Artifacts: Files created/modified, with paths

  • Constraints: What must not be changed, invariants

  • Open Questions: Unresolved decisions that need Agent B's input

  • Next Steps: Explicit instructions for what Agent B should do

Anti-pattern: Passing the entire conversation history. Always summarize.

Workflows

Workflow 1: Bootstrap Agent Context for a New Codebase

Step 1: Index the codebase

  • Build file tree with metadata (language, size, last modified)
  • Extract all exports, imports, and dependency edges
  • Identify entry points (main files, route handlers, CLI commands)

Step 2: Construct initial knowledge graph

  • Map module dependencies
  • Identify architectural layers (API, service, data, config)
  • Detect frameworks and conventions (naming, structure, patterns)

Step 3: Generate project summary

  • One paragraph: what this project does
  • Architecture diagram (text-based)
  • Key directories and their roles
  • Critical files (config, entry points, shared types)

Step 4: Configure context tiers

  • Tier 0: Project summary, CLAUDE.md, active file
  • Tier 1: Related files within same module
  • Tier 2: Cross-module dependencies
  • Tier 3: Documentation and examples

Workflow 2: Optimize Context for a Specific Task

Step 1: Parse task requirements

  • Extract entities (files, functions, features mentioned)
  • Identify task type (bug fix, feature, refactor, review)

Step 2: Retrieve relevant context

  • File-level: files matching entities
  • Dependency-level: imports/exports of matched files
  • Test-level: tests covering matched code
  • History-level: recent changes to matched files

Step 3: Budget allocation

  • Calculate total tokens available
  • Allocate per tier (see Token Budget Framework)
  • Pack context with greedy relevance

Step 4: Verify coverage

  • Check: all mentioned files included?
  • Check: type definitions for used types included?
  • Check: test examples for expected behavior included?
  • If gaps: retrieve missing context from lower tiers

Workflow 3: Session Memory Management

Step 1: During session - capture learnings

  • New patterns discovered: log to working memory
  • Corrections received: mark as high-confidence learning
  • Errors encountered: log with resolution

Step 2: End of session - evaluate learnings

  • Which learnings are project-specific vs session-specific?
  • Which patterns recurred during this session?
  • Which corrections should become rules?

Step 3: Promote valuable learnings

  • Recurring patterns → CLAUDE.md or .claude/rules/
  • Project conventions → project documentation
  • Error resolutions → knowledge base

Step 4: Prune stale memory

  • Remove learnings about deleted files
  • Update learnings contradicted by new information
  • Archive session-specific context

Anti-Patterns

Anti-Pattern Problem Better Approach

Dumping entire files into context Wastes tokens on irrelevant code Retrieve specific functions/sections

No output buffer reservation Agent output gets truncated Always reserve 10-15% for output

Static context loading Same context regardless of task Dynamic retrieval based on task type

No staleness tracking Using outdated information Timestamp and verify before using

Full conversation replay Older turns crowd out relevant code Sliding window with summarization

Ignoring import graph Missing type definitions, broken understanding Always include direct dependencies

Evaluation Metrics

Metric Description Target

Context Relevance % of loaded context actually used in response

70%

Retrieval Precision % of retrieved items that are relevant

80%

Token Utilization % of context budget used productively

85%

Staleness Rate % of context items that are outdated < 5%

Cache Hit Rate % of tool results served from cache

40%

Handoff Completeness % of required context passed between agents 100%

Integration Points

Skill Integration

rag-architect Use RAG Architect for vector store design; Context Engine for retrieval strategy

agent-designer Agent Designer defines agent roles; Context Engine manages what each agent knows

self-improving-agent Self-Improving Agent promotes learnings; Context Engine decides when/how to load them

observability-designer Monitor context utilization metrics alongside agent performance

References

  • references/context-window-strategies.md

  • Detailed packing algorithms and benchmarks

  • references/code-retrieval-patterns.md

  • RAG for code: chunking, embedding, and ranking strategies

  • references/memory-architecture-guide.md

  • Multi-layer memory system design patterns

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

ml-ops-engineer

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

senior-secops

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-designer

No summary provided by upstream source.

Repository SourceNeeds Review