context-builder

Generate LLM-optimized codebase context from any directory using context-builder CLI

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "context-builder" with this command: npx skills add igorls/context-builder

Context Builder — Agentic Skill

Generate a single, structured markdown file from any codebase directory. The output is optimized for LLM consumption with relevance-based file ordering, AST-aware code signatures, automatic token budgeting, and smart defaults.

Installation

# Requires Rust toolchain. Builds from source with cryptographic verification via crates.io.
cargo install context-builder --features tree-sitter-all

Pre-built binaries with SHA256 checksums are also available for manual download from GitHub Releases.

Verify: context-builder --version (expected: 0.8.3)

Security & Path Scoping

IMPORTANT: This tool reads file contents from the specified directory. Agents MUST follow these rules:

  • Only target explicit project directories — always pass the exact project root (e.g., /home/user/projects/myapp). Never point at home directories, system paths, or credential stores (~/.ssh, ~/.aws, /etc, ~, /)
  • Use scoped filters — use -f to limit to known source extensions (e.g., -f rs,toml,md), reducing exposure surface
  • Output to project-local paths — write output to the project's docs/ folder or /tmp/, never to shared or public locations
  • Review before sharing — the output may contain API keys, secrets, or credentials embedded in source files; always review or use .gitignore patterns to exclude sensitive files

Built-in protections (always active, no configuration needed):

  • Excludes .git/, node_modules/, and 19 other heavy/sensitive directories at any depth
  • Respects .gitignore rules when a .git directory is present
  • Binary files are auto-detected and skipped via UTF-8 sniffing
  • Output file and cache directory are auto-excluded to prevent self-ingestion

When to Use

  • Deep code review — Feed an entire codebase to an LLM for architecture analysis or bug hunting
  • Onboarding — Generate a project snapshot for understanding unfamiliar codebases
  • Diff-based updates — After code changes, generate only the diffs to update an LLM's understanding
  • AST signatures — Extract function/class signatures for token-efficient structural understanding
  • Cross-project research — Quickly package a dependency's source for analysis

Core Workflow

1. Quick Context (whole project)

context-builder -d /path/to/project -y -o context.md
  • -y skips confirmation prompts (recommended for agent workflows when path is explicitly scoped)
  • Output includes: header → file tree → files sorted by relevance (config → source → tests → docs)

2. Scoped Context (specific file types)

context-builder -d /path/to/project -f rs,toml -i docs,assets -y -o context.md
  • -f rs,toml includes only Rust and TOML files
  • -i docs,assets excludes directories by name

3. AST Signatures Mode (minimal tokens)

context-builder -d /path/to/project --signatures -f rs,ts,py -y -o signatures.md
  • Replaces full file content with extracted function/class signatures (~4K vs ~15K tokens per file)
  • Supports 8 languages: Rust, JavaScript (.js/.jsx), TypeScript (.ts/.tsx), Python, Go, Java, C, C++
  • Requires --features tree-sitter-all at install time

4. Signatures with Structural Summary

context-builder -d /path/to/project --signatures --structure -y -o context.md
  • --structure appends a count summary (e.g., "6 functions, 2 structs, 1 impl block")
  • Combine with --visibility public to show only public API surface

5. Budget-Constrained Context

context-builder -d /path/to/project --max-tokens 100000 -y -o context.md
  • Caps output to ~100K tokens (estimated)
  • Files are included in relevance order until budget is exhausted
  • Automatically warns if output exceeds 128K tokens

6. Token Count Preview

context-builder -d /path/to/project --token-count
  • Prints estimated token count without generating output
  • Use this first to decide if filtering or --signatures is needed

7. Incremental Diffs

First, ensure context-builder.toml exists with:

timestamped_output = true
auto_diff = true

Then run twice:

# First run: baseline snapshot
context-builder -d /path/to/project -y

# After code changes: generates diff annotations
context-builder -d /path/to/project -y

For minimal output (diffs only, no full file bodies):

context-builder -d /path/to/project -y --diff-only

Smart Defaults

These behaviors require no configuration:

FeatureBehavior
Auto-ignorenode_modules, dist, build, __pycache__, .venv, vendor, and 12 more heavy dirs are excluded at any depth
Self-exclusionOutput file, cache dir, and context-builder.toml are auto-excluded
.gitignoreRespected automatically when .git directory exists
Binary detectionBinary files are skipped via UTF-8 sniffing
File orderingConfig/docs first → source (entry points before helpers) → tests → build/CI → lockfiles

CLI Reference (Agent-Relevant Flags)

FlagPurposeAgent Guidance
-d <PATH>Input directoryAlways use absolute paths for reliability
-o <FILE>Output pathWrite to project docs/ or /tmp/
-f <EXT>Filter by extensionComma-separated: -f rs,toml,md
-i <NAME>Ignore dirs/filesComma-separated: -i tests,docs,assets
--max-tokens <N>Token budget capUse 100000 for most models, 200000 for Gemini
--token-countDry-run token estimateRun first to check if filtering is needed
-ySkip all promptsUse only with explicit, scoped project paths
--previewShow file tree onlyQuick exploration without generating output
--diff-onlyOutput only diffsMinimizes tokens for incremental updates
--signaturesAST signature extractionRequires tree-sitter-all feature at install
--structureStructural summaryPair with --signatures for compact output
--visibility <V>Filter by visibilityall (default), public (public API only)
--truncate <MODE>Truncation strategysmart (AST-aware) or simple
--initCreate config fileAuto-detects project file types
--clear-cacheReset diff cacheUse if diff output seems stale

Recipes

Recipe: Deep Think Code Review

Generate a scoped context file, then prompt an LLM for deep analysis:

# Step 1: Generate focused context
context-builder -d /path/to/project -f rs,toml --max-tokens 120000 -y -o docs/deep_think_context.md

# Step 2: Feed to LLM with a review prompt
# Attach docs/deep_think_context.md and ask for:
# - Architecture review
# - Bug hunting
# - Performance analysis

Recipe: API Surface Review (signatures only)

# Extract only public signatures — typically 80-90% fewer tokens than full source
context-builder -d /path/to/project --signatures --visibility public -f rs -y -o docs/api_surface.md

Recipe: Compare Two Versions

# Generate context for both versions
context-builder -d ./v1 -f py -y -o /tmp/v1_context.md
context-builder -d ./v2 -f py -y -o /tmp/v2_context.md

# Feed both to an LLM for comparative analysis

Recipe: Monorepo Slice

# Focus on a specific package within a monorepo
context-builder -d /path/to/monorepo/packages/core -f ts,tsx -i __tests__,__mocks__ -y -o core_context.md

Recipe: Quick Size Check Before Deciding Strategy

# Check if the project fits in context
context-builder -d /path/to/project --token-count

# If > 128K tokens, try signatures mode first:
context-builder -d /path/to/project --signatures --token-count

# Or scope it down:
context-builder -d /path/to/project -f rs,toml --max-tokens 100000 --token-count

Configuration File (Optional)

Create context-builder.toml in the project root for persistent settings:

output = "docs/context.md"
output_folder = "docs"
filter = ["rs", "toml"]
ignore = ["target", "benches"]
timestamped_output = true
auto_diff = true
max_tokens = 120000
signatures = true
structure = true
visibility = "public"

Initialize one automatically with context-builder --init.

Output Format

The generated markdown follows this structure:

# Directory Structure Report
[metadata: project name, filters, content hash]

## File Tree
[visual tree of included files]

## Files
### File: src/main.rs
[code block with file contents, syntax-highlighted by extension]

### File: src/lib.rs
...

Files appear in relevance order (not alphabetical), prioritizing config and entry points so LLMs build understanding faster.

When --signatures is active, file contents are replaced with extracted signatures:

### File: src/lib.rs
```rust
pub fn run_with_args(args: Args, config: Config, prompter: &dyn Prompter) -> Result<()>
pub fn generate_markdown_with_diff(...) -> Result<String>
```

Error Handling

  • If context-builder is not installed, install with cargo install context-builder --features tree-sitter-all
  • If --signatures shows no output for a file, the language may not be supported or the feature was not enabled at install
  • If output exceeds token limits, add --max-tokens or narrow with -f / -i, or use --signatures
  • If the project has no .git directory, auto-ignores still protect against dependency flooding
  • Use --clear-cache if diff output seems stale or incorrect

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Miaoji Asin Clinic Pro

亚马逊ASIN诊所Pro版,90天行动计划+竞品对标+季节性优化日历。 从合规度、广告度、评论度、视觉度、内容度五维升级为可执行的长期作战方案。 基础功能可使用 miaoji-asin-clinic 免费版。

Registry SourceRecently Updated
Coding

wechat-publish-pro

Pure Python tool to convert Markdown to styled HTML and publish articles to WeChat official account drafts with AI-based content refinement and theme support.

Registry SourceRecently Updated
Coding

Miaoji Asin Clinic

基于ASIN和品类,快速诊断亚马逊Listing五维健康指数并智能排序修复优先级,提供详细分析与个性化修复方案。

Registry SourceRecently Updated
Coding

Toonany

A Claude Code skill for creating AI-generated short dramas (漫剧) from novels and stories. Use when user mentions "漫剧创作", "小说转剧本", "分镜生成", "短剧制作", "故事线生成", "大纲...

Registry SourceRecently Updated