Agent Architecture Evaluator
Version: 1.0.0
Overview
This skill reviews the architecture of an agent system, not just its prompts or its attached skills.
Use it for architectures involving components such as:
- planner / executor splits
- routers and specialists
- tool-use layers
- memory systems
- human approval gates
- multi-agent coordination
Use this skill when
- A user wants to assess an existing agent architecture.
- Reliability, latency, cost, or coordination problems appear to be architectural.
- A team needs a structured architecture review and optimization roadmap.
- You need system-level test scenarios rather than single-skill evals.
Do not use this skill when
- The problem is one isolated skill.
- The task is to create a new skill from scratch.
- The main need is portfolio review across many related skills.
Use agent-test-measure-refine or agent-skill-portfolio-evaluator in those cases.
Output contract
Always produce these named outputs:
architecture_inventoryfailure_mode_maparchitecture_test_planoptimization_roadmapmeasurement_planarchitecture_recommendation
Review dimensions
Evaluate at least these dimensions:
component clarityrouting correctnessmemory usefulnesscoordination reliabilitycost and latency efficiencyobservability and debuggability
Quick start
- Map the current architecture.
- Identify critical paths and failure-prone handoffs.
- Define architecture-level test scenarios.
- Identify bottlenecks in routing, memory, tools, or coordination.
- Recommend the smallest structural changes with the highest leverage.
Workflow
1. Build the architecture inventory
Capture:
- components
- responsibilities
- inputs and outputs
- state or memory boundaries
- human approval points
- observability signals
2. Map failure modes
Look for:
- planner produces unusable tasks
- router sends work to the wrong specialist
- memory pollutes current decisions
- tool calls are slow, redundant, or poorly validated
- multi-agent handoffs lose context
- approval gates appear too late
3. Design system tests
Cover:
- happy path
- degraded upstream input
- partial component failure
- tool unavailability
- stale or noisy memory
- high-latency coordination
- rollback or recovery behavior
See references/architecture-review-framework-v1.0.0.md.
4. Prioritize architectural changes
Prefer:
- clarifying responsibilities before adding components
- removing weak indirection
- tightening interface contracts
- adding observability before adding complexity
- isolating state when cross-contamination is likely
5. Define measurement
Recommend concrete metrics where available:
- task success rate
- retry rate
- fallback rate
- cost per successful task
- latency by stage
- human intervention rate
Anti-patterns
- adding new components to hide unclear ownership
- keeping weak memory because it sounds sophisticated
- optimizing one stage without measuring system impact
- blaming prompts for structural routing failures
Resources
references/architecture-review-framework-v1.0.0.mdfor system review steps.references/optimization-patterns-v1.0.0.mdfor architecture optimization guidance.assets/architecture-review-template.mdfor the final report structure.assets/example-architecture-review.mdfor a realistic filled review.assets/architecture-input-example.jsonfor structured input.scripts/render_architecture_review.pyto normalize a structured architecture review into Markdown.