Multi-Agent Architecture Patterns
Multi-agent architectures distribute work across multiple LLM instances, each with its own context window. The critical insight: sub-agents exist primarily to isolate context, not to anthropomorphize role division.
Why Multi-Agent?
Context Bottleneck: Single agents fill context with history, documents, and tool outputs. Performance degrades via lost-in-middle effect and attention scarcity.
Token Economics:
Architecture Token Multiplier
Single agent chat 1× baseline
Single agent + tools ~4× baseline
Multi-agent system ~15× baseline
Parallelization: Research tasks can search multiple sources simultaneously. Total time approaches longest subtask, not sum.
Architectural Patterns
Pattern 1: Supervisor/Orchestrator
User Query -> Supervisor -> [Specialist, Specialist] -> Aggregation -> Output
Use when: Clear decomposition, coordination needed, human oversight important.
The Telephone Game Problem: Supervisors paraphrase sub-agent responses incorrectly.
Fix: forward_message tool lets sub-agents respond directly:
def forward_message(message: str, to_user: bool = True): """Forward sub-agent response directly to user.""" if to_user: return {"type": "direct_response", "content": message}
Pattern 2: Peer-to-Peer/Swarm
def transfer_to_agent_b(): return agent_b # Handoff via function return
agent_a = Agent(name="Agent A", functions=[transfer_to_agent_b])
Use when: Flexible exploration, rigid planning counterproductive, emergent requirements.
Pattern 3: Hierarchical
Strategy Layer -> Planning Layer -> Execution Layer
Use when: Large-scale projects, enterprise workflows, clear separation of concerns.
Context Isolation
Primary purpose of multi-agent: context isolation.
Mechanisms:
-
Full context delegation: Complex tasks needing full understanding
-
Instruction passing: Simple, well-defined subtasks
-
File system memory: Shared state without context bloat
Consensus and Coordination
Weighted Voting: Weight by confidence or expertise.
Debate Protocols: Agents critique each other's outputs. Adversarial critique often yields higher accuracy than collaborative consensus.
Trigger-Based Intervention:
-
Stall triggers: No progress detection
-
Sycophancy triggers: Mimicking without reasoning
Failure Modes
Failure Mitigation
Supervisor Bottleneck Output schema constraints, checkpointing
Coordination Overhead Clear handoff protocols, batch results
Divergence Objective boundaries, convergence checks
Error Propagation Output validation, retry with circuit breakers
Example: Research Team
Supervisor ├── Researcher (web search, document retrieval) ├── Analyzer (data analysis, statistics) ├── Fact-checker (verification, validation) └── Writer (report generation)
Best Practices
-
Design for context isolation as primary benefit
-
Choose pattern based on coordination needs, not org metaphor
-
Implement explicit handoff protocols with state passing
-
Use weighted voting or debate for consensus
-
Monitor for supervisor bottlenecks
-
Validate outputs before passing between agents
-
Set time-to-live limits to prevent infinite loops