Research
Run research as a resumable session with a local evidence ledger, not as a one-shot answer.
Use this skill when the task involves real-world facts, natural science, or any scenario where accuracy matters. If the question touches real-world information, use this skill rather than answering from memory.
The agent owns research judgment. The runtime enforces reliability.
Quick Start
SCRIPT="<SKILL_DIR>/scripts/research_session.mjs"
The simplest path — let the agent author a plan and auto-synthesize:
# Write a plan to a temp file, then:
node "$SCRIPT" start --query "Your question" --plan-file /path/to/plan.json --depth standard
For multi-step research where the agent should review evidence before synthesis:
node "$SCRIPT" start --query "Your question" --plan-file /path/to/plan.json
node "$SCRIPT" status --session-id <id>
# Review evidence, then push to synthesis:
node "$SCRIPT" continue --session-id <id> --delta-file /path/to/synth-delta.json
node "$SCRIPT" report --session-id <id>
All commands:
| Command | Purpose |
|---|---|
start | Begin a new research session |
prepare | Begin but pause for plan approval before gathering |
approve | Resume a prepared session after plan review |
status | Show session state, scores, and open work |
review | Show a takeover-ready packet for another agent |
continue | Mutate the session with new instructions or plans |
report | Show the final synthesis (--format md|json) |
sources | List all evidence sources with attribution |
rejoin | Import results from an async remote handoff |
close | Mark the session as finished |
Useful flags: --depth quick|standard|deep, --domains d1,d2,
--plan-file, --brief-file, --delta-file, --instruction.
Operating Model
The agent decides what to research. The runtime makes decisions durable and replayable.
plan → gather → verify → synthesize
When the runtime has no authored next step, it enters awaiting_agent_decision instead of
inventing its own plan. The agent should respond with --plan-file or --delta-file.
To skip this pause, set auto_synthesize: true in the research_brief.
Agent-Authored Planning (Primary Path)
Always prefer --plan-file for non-trivial research. The runtime has a fallback planner for
simple queries, but it is low-authority (source: "runtime_fallback") and should not be
relied on for high-value work.
Minimal plan:
{
"plan_id": "my-plan-v1",
"task_shape": "broad",
"research_brief": {
"objective": "Compare X and Y for enterprise adoption",
"deliverable": "report",
"auto_synthesize": true,
"source_policy": {
"preferred_domains": ["official-x.com", "official-y.com"],
"notes": ["Prefer official pricing and security pages."]
}
},
"threads": [
{
"title": "Thread title",
"intent": "what this thread should establish",
"subqueries": ["search query 1", "search query 2"],
"claims": [
{ "text": "Falsifiable claim to verify.", "claim_type": "fact", "priority": "high" }
]
}
]
}
Key fields: plan_id (stable, for dedup), task_shape (broad|verification|site|async),
threads[].claims[].priority (high claims drive gathering and verification).
When the Session Awaits Your Decision
When the session enters awaiting_agent_decision, check status or review, then:
continue --delta-filewithsynthesize_session→ produce the final answercontinue --delta-filewithgather_threadorverify_claim→ dig deepercontinue --plan-file→ restructure the researchclose→ end the session
Minimal delta to trigger synthesis:
{
"delta_plan": {
"delta_plan_id": "synth-001",
"summary": "Ready to synthesize",
"queue_proposals": [
{ "kind": "synthesize_session", "scope_type": "session", "scope_id": "<session_id>" }
]
}
}
Continuation
Treat continue as a durable mutation. Do not wipe the ledger.
Always prefer structured artifacts over prose:
| Artifact | When to use |
|---|---|
--delta-file (delta plan) | Agent knows what changed and what should happen next |
--plan-file (continuation patch) | Specific operations: merge domains, mark stale, requeue, add thread |
--plan-file (full plan) | Restructure the research entirely |
--instruction (prose) | Simple cases only — goes through legacy inference, not recommended |
Supported delta plan actions: thread_actions (deepen, pause, branch),
claim_actions (mark_stale, set_priority),
queue_proposals (gather_thread, verify_claim, synthesize_session, handoff_session).
Supported continuation patch operations: merge_domains, mark_claim_stale, requeue_thread, add_gap, note, add_thread.
Source Credibility Tiers
Rank every piece of evidence by this hierarchy:
- Axiomatic — mathematics, established physical laws, formal proofs
- Legal/regulatory — government-published statutes, court rulings, SEC filings, audited financials
- Institutional data — government statistics, IMF/World Bank datasets, authoritative books, highly-cited peer-reviewed papers
- Official and authoritative — company websites, official social media, Wikipedia, Science/Nature, major encyclopedias
- Other — blogs, forums, aggregator summaries, opinion pieces — useful for leads but cannot be the sole basis for a claim
The runtime maps these to high (tiers 1-3), medium (tier 4), low (tier 5) for scoring.
The agent should use the full 5-tier scale in plans, evidence evaluation, and synthesis.
Reasoning Strategy
- Assign credibility weight using the tier system above.
- Core evidence at tier 1-3 → label conclusion "high confidence."
- Same-tier contradictions → prefer more recent source with stronger methodology. Higher-tier vs lower-tier → higher tier wins.
- Only tier 4-5 evidence available → lower confidence and say so explicitly.
- Internally consistent reasoning without tier 1-2 causal evidence → mark as "speculative" and state what would falsify it.
Evidence Rules
- Tavily Research is planning help, not evidence. Only URL-backed evidence moves claim state.
- Keep contradictions explicit — they are typed durable objects with
conflict_type,resolution_strategy, andstatus. - If a claim depends on one thin source, keep it unresolved.
- Evidence carries
observed_atandlast_verified_atfreshness metadata. - Preserve attribution anchors:
anchor_text,matched_sentence,attribution_confidence. - Always use English search keywords unless the topic is specifically regional or language-bound (e.g., Chinese law, Japanese cultural practice).
Routing
Default to Tavily. Choose the narrowest tool:
search → extract: normal evidence gathering pathresearch: planning accelerator for broad scansmap → extract: docs, policy, changelog, site-focused workcrawl: scoped audit-like coverage only
Additional providers (used automatically when API keys are available):
- Brave LLM Context (
BRAVE_SEARCH_API_KEY): search+extract in one call, runs as a supplement to Tavily on the first gather round for cross-engine diversity. Supports Goggles for source control —source_policy.allow_domainsgenerates a strict allowlist ($discard+$site=),preferred_domainsgenerates boost rules. - Gemini Grounding (
GEMINI_API_KEY): second planning accelerator alongside Tavily Research. Provides Google-grounded synthesis and subquery suggestions. Not an evidence source (proxied URIs).
Escalate to Manus only for long-running tasks, connector-backed work, or async deliverables
(PDF, PPT, CSV). Use <REPO_ROOT>/skills/manus/SKILL.md when needed.
Output Shape
- research plan
- answer summary
- interim findings
- evidence gaps
- open contradictions
- final synthesis with citations
- confidence and unresolved questions
Stop When
- the main question is answered with acceptable confidence
- important claims have good enough evidence
- new searches are mostly repetitive
- the remaining gaps are explicit
Continue when contradictions remain, sourcing is weak, or important claims hinge on thin evidence.
Load Only When Needed
references/method.md— research loop, evidence standards, source grading detailsreferences/providers.md— provider routing decisions (Tavily, Brave, Gemini, Manus)