Searching Precisely
Overview
Web search pipeline that minimizes token consumption via local intent classification, semantic caching, credibility validation, and streaming fragment assembly.
架构分工:
- **宿主 AI(Host Agent)**负责实际的网页 GET / Search API 调用,返回原始 fragments
- 本 Skill 的脚本负责前处理(intent 分类、query 改写、budget 控制、cache 查询)和后处理(credibility probe、stream 组装、cache 写入)
Core Rule: Always check the semantic cache first. Only invoke web search on a cache miss.
Pipeline Architecture
Query → [Intent Parser] → [Query Rewriter] → [Budget Controller]
↓
[Semantic Cache] ──hit──→ Return
↓ miss
[Web Search] (≤1500 tok)
↓
[Parallel Credibility Probe]
↓
[Stream Assembler] → [Write Cache]
Instructions
When this skill activates, execute the pipeline below in order. Exit early at any step that produces a final answer — do not run later steps unnecessarily.
Note: Replace
<placeholders>with actual runtime values. All arguments must be valid JSON strings.
Step 1 — Classify Intent
Run via shell tool:
node scripts/intent-parser.js '<original_query>'
Extract intent and confidence from the JSON output.
If confidence < 0.5, default to intent = "web_search" and continue.
Step 2 — Initialize Budget
node scripts/budget-controller.js init
Keep the returned state.remaining value. Abort any later step that would exceed it.
Step 3 — Check Semantic Cache
node scripts/semantic-cache.js check '{"query":"<original_query>","intent":"<intent>"}'
hit: trueandsimilarity ≥ 0.85→ returnresultto the user. Pipeline complete. Skip all remaining steps.hit: false→ continue to Step 4.
Step 4 — Rewrite Query
node scripts/query-rewriter.js '{"intent":"<intent>","query":"<original_query>"}'
Use the returned subQueries array (max 3) for web search.
Step 5 — Web Search (host agent)
Using your native search_web tool, search each sub-query from Step 4.
Collect result URLs and content fragments.
Always perform live search on a cache miss — never fabricate results.
Step 6 — Validate Source Credibility
Extract up to 5 unique source URLs from Step 5. Run:
node scripts/parallel-probe.js '{"sources":[{"url":"<url1>"},{"url":"<url2>"}]}'
verdict: "trust"→ use directlyverdict: "verify"→ use with caution; flag in the answeravailable: false→ discard that source
Step 7 — Score Credibility
node scripts/credibility-arbiter.js '{"results":[<probe_results_array>]}'
If all sources score < 0.4, discard everything and tell the user no reliable source was found. Do not assemble.
Step 8 — Assemble Answer
node scripts/stream-assembler.js '{"fragments":[<trusted_fragments>],"query":"<original_query>"}'
Return the answer field to the user.
If coherenceScore < 0.5, add a note that the result may be incomplete.
Step 9 — Write Cache
node scripts/semantic-cache.js write '{"query":"<original_query>","intent":"<intent>","result":"<answer>"}'
This caches the result for future similar queries. Always run this step after a successful assembly.
Search Decision Rules
| Situation | Action |
|---|---|
| Query answered by cache (similarity > 0.85) | Return cache, skip web search |
| Query about API docs / official specs | Web search with known domain hint |
| Query about current events / pricing / news | Web search, no cache assumed |
| Query with ambiguous intent | Rewrite into 2-3 sub-queries, search each |
Token Budget Defaults
| Component | Budget |
|---|---|
| Intent parse | 50 |
| Cache check | 20 |
| Web search | 1500 |
| Credibility | 300 |
| Assembly | 500 |
| Total | 2370 |
Common Mistakes
- ❌ Calling web search before checking the semantic cache
- ❌ Skipping credibility check for unknown domains
- ❌ Assembling fragments from a single unverified source
- ❌ Fetching full page content when a HEAD probe + summary suffices
ClawHub Security Notes
All permission declarations, dependencies, I/O scopes, and security properties are formally declared in the YAML frontmatter above (
permissions,dependencies,io,security_notes). Static analyzers should use those fields as the authoritative source.
Summary per script:
| Script | I/O | Network | Shell |
|---|---|---|---|
intent-parser.js | none | none | none |
query-rewriter.js | none | none | none |
stream-assembler.js | none | none | none |
budget-controller.js | none | none | none |
credibility-arbiter.js | none | none | none |
semantic-cache.js | ~/.antigravity/search-cache/ R/W | none | none |
parallel-probe.js | reputation DB R (bundled) | HEAD only, no upload | none |