Multi-Agent Research

Production patterns from Anthropic's research system for conducting complex, multi-faceted research efficiently.

Source: How we built our multi-agent research system

When to Use

Use multi-agent-research patterns when:

Research has 3+ independent dimensions to explore
Comparing multiple alternatives simultaneously
Need to synthesize information from 10+ sources
Time-sensitive research requiring parallelization
Complex technical landscape with unclear paths
Breadth-first exploration needed

Don't use for:

Simple fact-finding (1-2 sources)
Single-dimension queries
When sequential research is sufficient

Core Principles

1. Scale Effort to Query Complexity

Assess complexity BEFORE starting research to allocate appropriate resources.

Simple Fact-Finding (3-10 tool calls)

Single specific question with clear answer
1-2 authoritative sources needed
Examples: Version number lookup, API syntax check, single DocType field validation

Approach: Direct search → validate → document

// Example: Check ERPNext Task field
mcp__ref__ref_search_documentation({
  query: "ERPNext Task DocType fields"
})
// Review results → Document finding with citation

Direct Comparison (10-15 tool calls)

Compare 2-3 specific alternatives
Field mapping between systems
Feature parity analysis

Approach: Parallel searches for each option → compare → recommend

// Example: Compare DocTypes - execute in parallel
mcp__ref__ref_search_documentation({ query: "ERPNext Task DocType" })
mcp__ref__ref_search_documentation({ query: "ERPNext Project Task DocType" })
mcp__ref__ref_search_documentation({ query: "ERPNext ToDo DocType" })
// Compare results → Build comparison table → Recommend

Complex Multi-Faceted Research (20+ tool calls)

Architecture decision with multiple unknowns
Integration across multiple systems
Novel feature requiring ecosystem research
Multiple independent research dimensions

Approach: Use deep researcher OR spawn parallel focused sub-tasks

// Example: Complex research with deep researcher
mcp__exasearch__deep_researcher_start({
  instructions: "Research OpenTelemetry integration for AWS Lambda Node.js. Cover: (1) Lambda layer vs manual instrumentation trade-offs, (2) X-Ray backend compatibility, (3) cold start performance impact, (4) 2025 best practices. Include code examples with versions.",
  model: "exa-research"
})
// Poll for completion, extract findings, add to research doc

Complexity Heuristic: If research question has 3+ independent dimensions, use parallel or deep researcher approach.

2. Parallel Tool Execution

Execute independent searches simultaneously for 90% faster results.

When to Parallelize

Comparing alternatives:

// ❌ DON'T: Sequential searches (slow)
await mcp__ref__ref_search_documentation({ query: "Redis session store" });
// wait...
await mcp__ref__ref_search_documentation({ query: "Memcached session store" });
// wait...

// ✅ DO: Parallel searches (single message, multiple tool calls)
mcp__ref__ref_search_documentation({ query: "Redis session store" })
mcp__ref__ref_search_documentation({ query: "Memcached session store" })
mcp__exasearch__web_search_exa({ query: "Redis vs Memcached 2025 comparison" })

Multi-source validation:

// Execute all simultaneously
mcp__ref__ref_search_documentation({ query: "OpenTelemetry Lambda official docs" })
mcp__exasearch__web_search_exa({ query: "OpenTelemetry Lambda examples 2025" })
mcp__exasearch__web_search_exa({ query: "OpenTelemetry Lambda deprecated OR EOL" })

Independent research dimensions:

// All can run in parallel
mcp__ref__ref_search_documentation({ query: "Stripe API authentication" })
mcp__exasearch__web_search_exa({ query: "Stripe API rate limits" })
mcp__exasearch__web_search_exa({ query: "Stripe webhook best practices 2025" })

3. Search Strategy: Start Wide, Then Narrow

Progressive refinement prevents missing important context.

Anti-Pattern: Overly Specific Initial Queries

❌ DON'T START WITH:
"how to implement OpenTelemetry auto-instrumentation for AWS Lambda with X-Ray backend using custom sampling rules in Node.js 18"

Result: Few/no results, miss alternative approaches

Recommended: Progressive Refinement

✅ STEP 1 - Broad Exploration (2-5 results):
"OpenTelemetry AWS Lambda Node.js"
→ Discover: What approaches exist? What's recommended?

✅ STEP 2 - Evaluate Landscape:
Review results, identify main approaches (layer vs manual instrumentation)

✅ STEP 3 - Narrow Focus (3-10 results):
"OpenTelemetry Lambda layer vs manual instrumentation"
→ Compare trade-offs

✅ STEP 4 - Specific Details:
"OpenTelemetry Lambda layer installation guide 2025"
→ Find working examples

Query Pattern Templates

Purpose	Pattern	Example
Discovery	`[technology] [use case]`	"GraphQL federation microservices"
Comparison	`[option A] vs [option B] [criteria]`	"REST vs GraphQL performance"
Implementation	`[specific approach] [version] guide`	"GraphQL Apollo Federation v2 guide"
Validation	`[library] deprecated OR EOL OR migration`	"Apollo Federation deprecated"

4. Thinking Process for Research

Use extended and interleaved thinking to plan strategy and evaluate results.

Planning Phase (Extended Thinking)

Before tool calls, think through:

[Extended thinking example]:

This ERPNext DocType selection question has 3 candidates to evaluate.
I need to research each in parallel:
- Field mappings (official docs)
- Custom field requirements (API specs)
- Community usage patterns (real-world examples)

I'll use ref.tools for official ERPNext docs (3 parallel calls) and
exa for community examples (3 parallel calls).

Total: 6 parallel tool calls for comprehensive coverage.

Planning Checklist:

What are the independent sub-questions?
Which tools fit this research?
What's the complexity level? (Simple/Comparison/Complex)
Which searches can run simultaneously?

After Tool Results (Interleaved Thinking)

After each set of results, evaluate:

[Interleaved thinking example]:

Results from 6 parallel searches received:

Quality check:
- Official docs: High confidence, current (v14)
- Community examples: Medium confidence, mix of v13/v14

Gap analysis:
- Tasks DocType: 80% field coverage, clear
- Project Tasks: 60% coverage, BUT community prefers for workflow integration
- Missing: Understanding WHY community prefers Project Tasks

Next step: Need one more targeted search on workflow capabilities difference

Evaluation Checklist:

✅ Are sources authoritative? Current? Relevant?
✅ What's still missing? What contradicts?
✅ Should I go deeper or pivot direction?
✅ What's my confidence level? (High/Medium/Low)

5. Deep Research Delegation

Know when to delegate to specialized deep research vs direct tool calls.

Decision Framework

Use Direct Research	Use Deep Researcher
Query scope clear and bounded	Open-ended exploration needed
2-4 specific sources	Unclear which sources to check
Can complete in 10-15 tool calls	Requires 10+ sources
Example: "Compare Redis vs Memcached for sessions"	Example: "Research state of GraphQL federation in 2025 - solutions, trade-offs, migration paths"

Deep Researcher Workflow

1. Start task with detailed instructions:

mcp__exasearch__deep_researcher_start({
  instructions: `Research OpenTelemetry integration for AWS Lambda Node.js functions.

Focus areas:
1. Lambda layer vs manual instrumentation (trade-offs, pros/cons)
2. X-Ray backend compatibility (setup, configuration)
3. Cold start performance impact (benchmarks, mitigation)
4. Current best practices as of 2025 (official recommendations)

Deliverables:
- Code examples with specific version numbers
- Official documentation links
- Deprecation warnings if any
- Recommended approach with justification`,
  model: "exa-research" // or "exa-research-pro" for very complex
})
// Returns: { taskId: "abc123" }

2. Poll for results (repeat until status: "completed"):

mcp__exasearch__deep_researcher_check({
  taskId: "abc123"
})
// Tool includes 5-second delay before checking
// Keep calling until status: "completed"

3. Extract and integrate findings:

## Findings from Deep Research

**Source**: Deep Researcher Task abc123

### Finding 1: Lambda Layer Recommended
- **Summary**: Official docs recommend layer approach over manual
- **Evidence**: "Layer reduces cold start by 200ms, handles auto-instrumentation"
- **Source**: OpenTelemetry Lambda Docs (High confidence)
- **Code Example**: `layers: ['arn:aws:lambda:...:opentelemetry-nodejs']`

[Additional findings...]

6. Self-Improvement from Tool Failures

Adapt when tools don't work as expected.

Document Failure Patterns

When tool repeatedly fails:

TOOL FAILURE PATTERN:
- Tool: mcp__ref__ref_search_documentation
- Query: "ERPNext v14 custom fields"
- Parameters: { query: "ERPNext v14 custom field creation API" }
- Expected: Documentation on custom field APIs
- Actual: No results returned (0 matches)
- Frequency: 5/5 attempts with different phrasings

Adaptation Strategies

Try alternative tool:

// Primary tool failing, switch to web search
mcp__exasearch__web_search_exa({
  query: "ERPNext v14 custom field creation site:erpnext.com"
})

Rephrase query:

// Original: "ERPNext v14 custom fields"
// Rephrased: "ERPNext custom field API" (drop version)
// Rephrased: "Frappe custom field creation" (use framework name)

Break into smaller queries:

// Original: "ERPNext v14 custom field creation and validation API"
// Split into:
mcp__ref__ref_search_documentation({ query: "ERPNext custom field creation" })
mcp__ref__ref_search_documentation({ query: "ERPNext field validation" })

Report to Planning

TOOL ISSUE REPORT:

**Tool**: mcp__ref__ref_search_documentation
**Issue**: Consistently returns no results for ERPNext v14 queries, v13 queries work
**Attempted**: 5 different query phrasings
**Workaround**: Using web_search_exa with site:erpnext.com filter, then validating
**Impact**: 30% slower research, but results accurate
**Recommendation**: Tool may need ERPNext v14 docs indexed

7. Findings Compression Strategy

Compress vast research into actionable context for downstream agents.

The Problem

Complex research generates 10+ sources with hundreds of pages. Action Agent needs compressed, decision-critical information only.

4-Element Compression Pattern

For each major finding:

Core claim (1 sentence)
Evidence (quote or concrete example)
Source (URL + confidence level)
Relevance (why this matters for decision)

Anti-Pattern: Copying Entire Docs

❌ DON'T:
Found 15-page OpenTelemetry Lambda guide covering:
- History of observability (3 pages)
- Architecture deep-dive (4 pages)
- Deployment options (2 pages)
- Configuration reference (5 pages)
- Troubleshooting (1 page)

[Paste entire guide]

Better: Extract Decision-Critical Information

✅ DO:

**Finding**: OpenTelemetry Lambda layer recommended over manual instrumentation

- **Core Claim**: Layer approach reduces cold start by 200ms vs manual
- **Evidence**: Official guide states "Layer handles auto-instrumentation and reduces cold start overhead through optimized initialization"
- **Source**: [OpenTelemetry Lambda Docs - Deployment Options](https://opentelemetry.io/docs/faas/lambda-nodejs/#deployment) (High confidence - official docs)
- **Code Example**:
  ```typescript
  layers: ['arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-nodejs-ver-1-18-0:1']

Relevance: Meets performance requirement (<500ms cold start) without custom instrumentation code, reduces maintenance burden

Alternative Considered: Manual instrumentation - rejected due to cold start penalty and maintenance overhead


#### Compression Checklist

Before handing off findings:

- [ ] Each finding fits in 3-5 sentences
- [ ] Code examples are minimal but working (not full files)
- [ ] Links to docs (don't paste full docs)
- [ ] Clear connection to decision criteria
- [ ] Confidence level explicit (High/Medium/Low)
- [ ] Alternatives considered and rejection rationale

## Integration with Existing Workflows

### For Researcher Agent

Apply these patterns in existing research workflow:

1. **Query Complexity Assessment** → Add before "Search Existing Project Documentation"
2. **Parallel Tool Execution** → Use in "Conduct External Research" phase
3. **Search Strategy** → Apply when formulating search queries
4. **Thinking Process** → Use before tool calls and after results
5. **Deep Researcher** → Option in "Conduct External Research" for complex topics
6. **Tool Failures** → Add to Error Handling section
7. **Findings Compression** → Apply in "Document Findings with Citations"

### Output Format

Maintain existing output structure, enhance with compression:

```markdown
## Key Findings

### Finding 1: [Title]
**Core Claim**: [1 sentence]
**Evidence**: [Quote or concrete example]
**Source**: [URL + confidence]
**Relevance**: [Decision impact]

**Code Example** (if applicable):
```[language]
// Minimal working example

[Repeat for 3-5 key findings]


## Success Metrics

Applying multi-agent research patterns should result in:

- **Speed**: 90% faster for breadth-first queries (via parallelization)
- **Quality**: Higher confidence findings (progressive refinement)
- **Efficiency**: Right-sized effort (complexity assessment)
- **Completeness**: Better coverage (parallel exploration)
- **Usability**: Actionable findings (compression)

## Examples

### Example 1: Simple Query

**Task**: Check if Redis supports session TTL

**Complexity**: Simple fact-finding
**Approach**: Direct search (3 tool calls)

```typescript
// Single targeted search
mcp__ref__ref_search_documentation({
  query: "Redis TTL expire session keys"
})
// Validate in official docs
// Document finding with citation

Result: 2 minutes, High confidence

Example 2: Comparison Query

Task: Compare Redis vs Memcached for session storage

Complexity: Direct comparison Approach: Parallel searches (10 tool calls)

// Execute in parallel (single message)
mcp__ref__ref_search_documentation({ query: "Redis session storage features" })
mcp__ref__ref_search_documentation({ query: "Memcached session storage features" })
mcp__exasearch__web_search_exa({ query: "Redis vs Memcached session store 2025" })
mcp__exasearch__web_search_exa({ query: "Redis session TTL persistence" })
mcp__exasearch__web_search_exa({ query: "Memcached session limitations" })

Result: 8 minutes, comparison table with pros/cons, High confidence recommendation

Example 3: Complex Multi-Faceted Query

Task: Research GraphQL federation migration from REST API

Complexity: Complex (architecture decision, multiple unknowns) Approach: Deep researcher (40+ sources)

mcp__exasearch__deep_researcher_start({
  instructions: `Research migrating from REST to GraphQL federation for microservices.

Focus areas:
1. Available federation solutions (Apollo, Mercurius, etc.) - compare features, maturity
2. Migration strategies (big bang vs incremental, REST wrapper patterns)
3. Schema stitching vs federation trade-offs
4. Performance implications (n+1 queries, caching)
5. Client migration (breaking changes, backward compatibility)
6. 2025 best practices and anti-patterns

Deliverables:
- Solution comparison table
- Recommended migration path with phases
- Code examples for federation setup
- Known gotchas and mitigation strategies`,
  model: "exa-research-pro"
})

// Poll until complete
mcp__exasearch__deep_researcher_check({ taskId: "..." })

// Extract findings, compress to key decisions
// Build recommendation with phased approach

Result: 45 minutes, comprehensive analysis with migration roadmap, Medium-High confidence (validated against official docs)

Anti-Patterns

❌ Starting Too Specific

Query: "implement OpenTelemetry auto-instrumentation AWS Lambda X-Ray custom sampling Node 18"
Result: 0 results or miss better approaches

Fix: Start broad, progressively narrow

❌ Sequential When Could Parallelize

await search("Redis features");
await search("Memcached features");
await search("comparison");

Fix: Single message with 3 tool calls

❌ Pasting Entire Docs

Finding: [15 pages of OpenTelemetry docs copied]

Fix: Extract 4-element compressed findings

❌ Skipping Thinking Steps

[Immediately calls 10 tools without planning]

Fix: Extended thinking to plan, interleaved thinking to evaluate

❌ Using Deep Researcher for Simple Queries

Task: "What's the latest Redis version?"
Approach: deep_researcher_start (overkill)

Fix: Simple direct search (1-2 tool calls)

References

Anthropic: How we built our multi-agent research system
Anthropic: Agents cookbook - Prompts
Linear Issue LAW-76 - Implementation tracking