code-mode

Add a "code mode" tool to an existing MCP server so LLMs can write small processing scripts that run against large API responses in a sandboxed runtime — only the script's compact output enters the LLM context window. Use this skill whenever someone wants to add code mode, context reduction, script execution, sandbox execution, or LLM-generated-code processing to an MCP server. Also trigger when users mention reducing token usage, shrinking API responses, running user-provided code safely, or adding a code execution tool to their MCP server — in any language (TypeScript, Python, Go, Rust, etc.).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "code-mode" with this command: npx skills add chenhunghan/code-mode-skill/chenhunghan-code-mode-skill-code-mode

Code Mode for MCP Servers

What is Code Mode?

When an MCP tool returns a large API response (e.g. listing 500 Kubernetes pods, 200 SCIM users, or thousands of GitHub issues), that entire payload enters the LLM's context window — consuming tokens and degrading performance.

Code mode flips the approach: instead of dumping raw data into context, the LLM writes a small processing script. The MCP server runs the script in a sandboxed runtime against the raw data, and only the script's stdout enters context.

This works especially well with well-known APIs (SCIM, Kubernetes, GitHub, Stripe, Slack, AWS, etc.) because the LLM already knows the response schema from training data — it can write the extraction script in one shot without inspecting the data.

Typical results: 65–99% context reduction.

Inspiration


How This Skill Works

This is an interactive planning skill. Work with the user step-by-step:

  1. Understand their MCP server (language, framework, what tools return large data)
  2. Select a sandbox that fits their server language and security needs
  3. Plan the implementation together
  4. Implement the code mode tool, sandbox executor, and benchmark
  5. Verify with benchmarks comparing before/after context sizes

Do not jump ahead. Confirm each step with the user before proceeding.


Step 1: Understand the Existing MCP Server

Ask the user (or discover by reading their codebase):

  • Server language: TypeScript/JavaScript, Python, Go, Rust, or other?
  • MCP framework: XMCP, FastMCP, mcp-go, custom, etc.?
  • Which tools return large responses? (e.g. list users, get pods, search issues)
  • What APIs do they call? Well-known APIs (SCIM, K8s, GitHub, Stripe) are ideal candidates because the LLM already knows the schema.
  • What languages should the sandbox support for script execution? Usually JavaScript is sufficient. Python is a common second choice.

Summarize your understanding back to the user and confirm before moving on.


Step 2: Select a Sandbox

The sandbox must be isolated from the host filesystem and network by default and secure by default. Present the user with options that match their server language, using the reference in references/sandbox-options.md.

Quick Selection Guide

If the server is TypeScript/JavaScript:

SandboxScript LanguageIsolationSizeNotes
quickjs-emscriptenJavaScriptWASM (no fs/net)~1MBLightweight, actively maintained, best default
pyodidePythonWASM (no fs/net)~20MBFull CPython in WASM, heavier
isolated-vmJavaScriptV8 isolate (no fs/net)~5MB nativeFast, separate V8 heap, not WASM

If the server is Python:

SandboxScript LanguageIsolationSizeNotes
RestrictedPythonPythonAST-restricted compileTinyCompiles to restricted bytecode, no I/O by default
pyodide (in-process WASM)PythonWASM~20MBHeavier but stronger isolation than RestrictedPython
quickjs (via quickjs PyPI)JavaScriptWASM/nativeSmallRun JS from Python

If the server is Go:

SandboxScript LanguageIsolationSizeNotes
gojaJavaScriptPure Go interpreterZero CGONo fs/net, widely used (used by Grafana)
WazeroWASM guest (JS/Python compiled to WASM)WASM runtime, pure GoZero CGOStrongest isolation, runs any WASM module
starlark-goStarlark (Python dialect)Pure Go interpreterZero CGODeterministic, no I/O, used by Bazel

If the server is Rust:

SandboxScript LanguageIsolationSizeNotes
boa_engineJavaScriptPure Rust interpreterNo unsafe depsES2024 support, embeddable
wasmtime / wasmerWASM guestWASM runtimeStrongRun any WASM module, strongest isolation
deno_coreJavaScript/TypeScriptV8-basedLargerFull V8, powerful but heavier
rustpythonPythonPure Rust interpreterModerateLess mature but functional

Read references/sandbox-options.md for detailed tradeoffs on each option.

Present 2–3 options to the user (filtered to their server language), explain the tradeoffs briefly, and let them choose. If they're unsure, recommend the lightest WASM-based option for their language.


Step 3: Plan the Implementation

Once the sandbox is selected, create a concrete plan with the user. The plan should cover these components:

3a. Code Mode Tool

A new MCP tool (e.g. code_mode or <domain>_code_mode) that accepts:

  • command or args: The underlying API call / query to execute (e.g. kubectl args, SCIM endpoint + params, GraphQL query)
  • code: The processing script the LLM writes
  • language (optional): Script language, defaults to javascript

The tool handler:

  1. Executes the underlying API call (reusing existing logic)
  2. Passes the raw response as a DATA variable into the sandbox
  3. Runs the script in the sandbox
  4. Returns only the script's stdout, plus a size measurement line: [code-mode: 18.0KB -> 6.2KB (65.5% reduction)]

3b. Sandbox Executor

A utility module that:

  • Initializes the chosen sandbox runtime
  • Injects DATA (the raw API response as a string) into the sandbox
  • Executes the user-provided script
  • Captures stdout and returns it
  • Enforces a timeout (e.g. 10 seconds)
  • Handles errors gracefully (script syntax errors, runtime errors)

3c. Wiring

  • Register the new tool in the MCP server's tool list
  • Optionally gate behind an env var (ask the user if they want this)

3d. Benchmark

A benchmark script that compares tool output size vs. code-mode output size across realistic scenarios. See references/benchmark-pattern.md for the template.

Present the plan to the user and confirm before implementing.


Step 4: Implement

Follow the confirmed plan. Implement in this order:

  1. Install the sandbox dependency (e.g. npm i quickjs-emscripten)
  2. Create the executor module — the sandbox wrapper
  3. Create the code mode tool — the MCP tool handler
  4. Wire it into the server — register the tool
  5. Create the benchmark script

Keep the implementation minimal — don't over-abstract. The executor and tool can each be a single file.

Implementation Tips

  • The DATA variable should always be a string (JSON-serialized). The script is responsible for parsing it if needed (JSON.parse(DATA) in JS, json.loads(DATA) in Python).
  • Include the reduction measurement in every response so the user/LLM can see the savings: [code-mode: {before}KB -> {after}KB ({pct}% reduction)]
  • Set a reasonable default timeout (10s) and memory limit if the sandbox supports it.
  • Return clear error messages if the script fails — the LLM will use the error to fix its script on the next call.

Step 5: Benchmark and Verify

After implementation, run the benchmark to verify code mode actually reduces context size. Read references/benchmark-pattern.md for the full template.

The benchmark should:

  1. Generate or fetch realistic test data — use faker/mock data if no live API is available, or hit a real endpoint if the user has one.
  2. Run each scenario through both paths:
    • Regular tool response (full JSON)
    • Code mode with a representative extraction script
  3. Print a comparison table showing before/after sizes and reduction %
  4. Print a total across all scenarios

Present the benchmark results to the user. Typical expectations:

  • Simple list extractions: 60–80% reduction
  • Filtered queries (e.g. "only inactive users"): 90–99% reduction
  • Aggregations (e.g. "count per department"): 95–99% reduction

Reference Files

  • references/sandbox-options.md — Detailed comparison of all sandbox options by server language, with security analysis and setup instructions
  • references/benchmark-pattern.md — Benchmark script template and methodology

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

openclaw-version-monitor

监控 OpenClaw GitHub 版本更新,获取最新版本发布说明,翻译成中文, 并推送到 Telegram 和 Feishu。用于:(1) 定时检查版本更新 (2) 推送版本更新通知 (3) 生成中文版发布说明

Archived SourceRecently Updated
Coding

ask-claude

Delegate a task to Claude Code CLI and immediately report the result back in chat. Supports persistent sessions with full context memory. Safe execution: no data exfiltration, no external calls, file operations confined to workspace. Use when the user asks to run Claude, delegate a coding task, continue a previous Claude session, or any task benefiting from Claude Code's tools (file editing, code analysis, bash, etc.).

Archived SourceRecently Updated
Coding

ai-dating

This skill enables dating and matchmaking workflows. Use it when a user asks to make friends, find a partner, run matchmaking, or provide dating preferences/profile updates. The skill should execute `dating-cli` commands to complete profile setup, task creation/update, match checking, contact reveal, and review.

Archived SourceRecently Updated