AI Agent PRD Guide
Overview
Write PRDs for AI Agent products that define not just what the agent does, but how it thinks, decides, and acts.
Relationship with Other Skills
This skill extends prd-writing-guide for AI Agent products specifically. You should:
-
Apply prd-writing-guide 's Seven Lenses to each agent capability
-
Follow prd-writing-guide 's Writing Style Guide for requirement clarity
-
Use prd-writing-guide 's Developer Test as your quality bar
Handoff: The Agent PRD this skill produces feeds into prd-to-engineering-spec for technical design. That skill includes an Agent-specific validation branch for converting agent capabilities into engineering specs.
Traditional PRD: Input → Deterministic Logic → Output Agent PRD: Goal → Perceive → Think → Decide → Act → Learn ↑ │ └───────── Feedback ────────┘
You're not defining a function. You're defining a cognitive architecture.
Quality Test
Can your engineering team answer these without asking you?
-
What is the agent's purpose and identity?
-
What capabilities (skills/tools) does it have?
-
How does it decide what to do?
-
What can it NOT do? (boundaries)
-
When should humans intervene?
-
How do we know if it's working well?
Quick Start
Generate a document skeleton:
bash scripts/generate_agent_prd_skeleton.sh ./docs/agent-prd "Customer Support Agent"
Fill in using templates from references
Validate completeness with checklist
Note: The skeleton generator writes a set of .md files into your output directory. Use a new/empty folder to avoid accidental overwrites.
Workflow
Phase 1: Agent Identity ──────► Who is the agent? What's its purpose? ↓ Phase 2: Capability Architecture ──► Skills, Tools, Memory, RAG, Workflows ↓ Phase 3: Behavior & System Prompt ─► How does it think? What's its DNA? ↓ Phase 4: Conversation Design ────► Golden conversations, example behaviors ↓ Phase 5: Safety & Guardrails ────► What can't it do? Human oversight? ↓ Phase 6: Evaluation Framework ───► How do we measure success? ↓ Phase 7: Operational Model ──────► Cost, scaling, iteration
Phase 1: Agent Identity
Goal: Define who the agent is and its relationship with users.
Key Elements
Element Questions to Answer
Persona Name, role, personality, expertise domain
Mission Why does this agent exist?
Boundaries What it IS vs what it is NOT
User Relationship Copilot, Autopilot, Peer, Expert, or Executor?
User-Agent Relationship Models
Model Description Example
Copilot Human leads, agent assists Code completion
Autopilot Agent leads, human monitors Customer support
Peer Equal collaboration Brainstorming
Expert Agent advises, human decides Medical advisor
Executor Human commands, agent executes Task automation
Phase 2: Capability Architecture
Goal: Define the building blocks that enable agent capabilities.
Capability Stack
┌─────────────────────────────────────────────────────────────────┐ │ SKILLS TOOLS WORKFLOWS │ │ (What it (External (Multi-step │ │ can do) actions) processes) │ │ └──────────────┼──────────────┘ │ │ ↓ │ │ AGENT CORE (Reasoning, Planning) │ │ ↓ │ │ ┌──────────────┼──────────────┐ │ │ MEMORY RAG CONTEXT │ │ (State/History) (Knowledge) (Awareness) │ └─────────────────────────────────────────────────────────────────┘
2.1 Skills
Reusable capability modules. See skills-specification.md.
Per skill, document:
-
Purpose & trigger conditions
-
Input/output specification
-
Process logic
-
Examples & boundaries
2.2 Tools
External actions the agent can invoke. See tools-specification.md.
Per tool, document:
-
Interface definition (JSON schema)
-
Execution details (endpoint, auth, timeout)
-
Response handling
-
Safety requirements (confirmation, audit)
2.3 Memory
Stateful, context-aware behavior. See memory-patterns.md.
Type Scope Example
Working Current request Context window
Session Current session Conversation history
Long-term Cross-session User preferences
2.4 Knowledge (RAG)
Knowledge grounding via retrieval. See memory-patterns.md for architecture patterns.
Per knowledge source, document:
Attribute Specify
Source What data source? (docs, DB, API, web)
Format Document types, data structure
Volume How much data? Growth rate?
Freshness Update frequency? Acceptable staleness?
Authority Is this authoritative? What if conflicting sources?
Retrieval configuration:
-
Chunking strategy (semantic, fixed-size, hybrid) and chunk size rationale
-
Embedding model and dimension
-
Retrieval method (dense, sparse, hybrid) and top-k range
-
Re-ranking strategy (if any)
-
Quality threshold (minimum similarity score for inclusion)
Knowledge gap handling:
-
How does the agent detect it doesn't know something?
-
Response when knowledge is insufficient (admit? search? escalate?)
-
Citation requirements (when must it cite? format? inline or footnote?)
Knowledge conflict resolution:
-
When multiple sources disagree, which takes priority?
-
Should the agent present conflicting views or choose one?
2.5 Workflows
Multi-step orchestrated processes. Document:
-
Trigger and steps with success criteria
-
Human checkpoints
-
Timeout and cancellation handling
Phase 3: Behavior & System Prompt
Goal: Define how the agent thinks, decides, communicates—and encode it into a System Prompt specification.
Reasoning Strategies
Strategy Description Use When
ReAct Think → Act → Observe → Repeat Most tasks
Plan-then-Execute Full plan upfront → Execute Complex multi-step
Tree of Thought Explore multiple paths Exploration needed
Reflexion Self-critique and improve Quality-critical
See agent-patterns.md for detailed patterns.
Decision Framework
Define priority order for agent decisions:
-
Safety first
-
User intent
-
Efficiency
-
Quality
Conversation Design
Aspect Define
Voice & Tone Persona, formality, verbosity
Response Patterns By scenario (simple, complex, error, out-of-scope)
Multi-turn Context retention, topic switching, reference resolution
System Prompt Specification ⭐ Core Deliverable
The System Prompt is the agent's DNA. The PRD must produce a System Prompt Design Spec (not the final prompt text, but its design intent). See system-prompt-design.md.
Required sections in the System Prompt Spec:
Section Content Example
Identity Declaration Who the agent is, role, personality "You are Aria, a senior financial advisor..."
Capability Declaration What tools/skills are available, when to use each "You have access to: search_docs, calculate..."
Behavioral Instructions How to reason, when to ask vs act, output style "Always explain your reasoning before acting..."
Constraint Boundaries What the agent must never do "Never provide medical diagnoses..."
Output Format Rules Response structure, length, formatting "Use bullet points for lists of 3+..."
Escalation Rules When and how to hand off to humans "If user mentions legal action, transfer to..."
Phase 4: Example Conversations (Golden Conversations)
Goal: Define concrete conversation examples that serve as both behavioral spec and acceptance criteria.
See conversation-design.md for detailed methodology.
Why Golden Conversations Matter
For Agent products, example conversations are the most precise behavioral specification. They are:
-
Acceptance criteria (does the agent behave like this example?)
-
Training signals (few-shot examples in the system prompt)
-
Evaluation dataset (automated quality testing)
-
Stakeholder alignment tool (shows exactly what "good" looks like)
Coverage Requirements
Design golden conversations for each of these scenario types:
Scenario Type Count Purpose
Happy path 2-3 per use case Shows ideal agent behavior
Edge cases 1-2 per use case Shows boundary handling
Safety boundaries 3-5 total Shows refusal/escalation
Multi-turn complex 2-3 total Shows context management
Context switching 1-2 total Shows topic change handling
Error recovery 2-3 total Shows tool failure handling
Out-of-scope 2-3 total Shows graceful boundary enforcement
Conversation Annotation Format
Each golden conversation should include:
Conversation: [Scenario Name]
Type: [happy-path | edge-case | safety | multi-turn | error] Tests: [Which capabilities/rules this validates]
Dialogue
User: [input] Agent: [expected response] // Annotation: [Why this response is correct. What rules apply.]
User: [follow-up] Agent: [expected response] // Annotation: [Key behavior being demonstrated]
Unacceptable Alternatives
- Agent should NOT: [describe bad behavior]
- Agent should NOT: [describe bad behavior]
Evaluation Criteria
- [Checkable criterion 1]
- [Checkable criterion 2]
Phase 5: Safety & Guardrails
Goal: Define boundaries, controls, and human oversight.
See safety-checklist.md for comprehensive checklist.
5.1 Capability Boundaries
Category Document
CAN DO Authorized actions with conditions
CANNOT DO Prohibited actions with response
MUST ASK Actions requiring confirmation
5.2 Human-in-the-Loop
Define when humans must intervene:
-
Approval triggers and workflow
-
Escalation paths
-
Override capabilities
5.3 Guardrails
Input Guardrails:
-
Prompt injection protection
-
Harmful request detection
-
Input validation
Output Guardrails:
-
Harmful content filtering
-
PII leakage prevention
-
Hallucination detection
5.4 Error Handling
Error Type Document
Tool failure Detection, message, recovery
Knowledge gap Detection, message, fallback
Reasoning failure Detection, restart/escalate
Phase 6: Evaluation Framework
Goal: Define how to measure agent quality and success.
See evaluation-rubrics.md for detailed rubrics.
Core Metrics
Dimension Metrics
Task Success Completion rate, first-turn resolution
Quality Accuracy, relevance, completeness
Safety Harmful response rate, boundary violations
Efficiency Latency, token usage, cost
User Experience CSAT, NPS, escalation rate
Evaluation Methods
Method Purpose Frequency
Automated Testing Regression, benchmarks Every change
Human Evaluation Quality assessment Weekly
LLM-as-Judge Scalable quality scoring Continuous
Red Team Testing Adversarial testing Quarterly
A/B Testing Compare variants As needed
Phase 7: Operational Model
7.1 Cost Model
Component Document
Per-request costs LLM tokens, embeddings, tool calls
Projected costs By scale (launch, 6 months, 1 year)
Cost controls Budgets, alerts, throttling
7.2 Scaling & Iteration
-
Scaling strategy (horizontal, rate limiting, caching)
-
Feedback collection mechanisms
-
Continuous improvement cycle
-
Version management
Output Structure
agent-prd/ ├── AGENT_PRD.md # Main document ├── IDENTITY.md # Agent persona & boundaries ├── USE_CASES.md # Users and use cases ├── SKILLS.md # Skills specification ├── TOOLS.md # Tools specification ├── MEMORY.md # Memory architecture ├── KNOWLEDGE.md # RAG configuration ├── WORKFLOWS.md # Workflow definitions ├── BEHAVIOR.md # Reasoning & conversation ├── SYSTEM_PROMPT_SPEC.md # System prompt design specification ⭐ ├── CONVERSATIONS.md # Golden conversations ⭐ ├── SAFETY.md # Guardrails ├── EVALUATION.md # Metrics & testing ├── EXAMPLES.md # Additional example interactions └── CHECKLIST.md # Completion checklist
Resources
Scripts:
- scripts/generate_agent_prd_skeleton.sh
- Generate PRD structure
Core References:
-
references/agent-prd-template.md
-
Complete PRD template
-
references/skills-specification.md
-
Skill definition guide
-
references/tools-specification.md
-
Tool definition guide
-
references/memory-patterns.md
-
Memory architecture patterns
-
references/agent-patterns.md
-
Reasoning & architecture patterns
-
references/conversation-design.md
-
Golden conversation methodology ⭐
-
references/worked-example.md
-
End-to-end worked example (HelpBot agent) ⭐
Safety & Evaluation:
-
references/safety-checklist.md
-
Safety requirements
-
references/evaluation-rubrics.md
-
Evaluation frameworks
Advanced Topics:
-
references/multi-agent-design.md
-
Multi-agent system design
-
references/system-prompt-design.md
-
System prompt engineering
-
references/multimodal-design.md
-
Multi-modal agent design
-
references/observability-operations.md
-
Monitoring & operations
-
references/protocols-standards.md
-
MCP, protocols, standards
-
references/domain-specific-design.md
-
Domain-specific guidance
Extensibility & Future-Proofing
This skill is designed to evolve with Agent technology:
Current Future-Ready
Text I/O Multimodal (vision, audio, video)
Single Agent Multi-Agent orchestration
Custom tools Protocol standards (MCP, Agent Protocol)
Basic metrics Full observability stack
Generic Domain-specific extensions
Adding new capabilities:
-
Add reference file in references/
-
Update SKILL.md Resources section
-
Extend PRD template if needed
Summary: Agent PRD Principles
┌─────────────────────────────────────────────────────────────────┐ │ 1. DEFINE IDENTITY - Who is this agent? Not just features. │ │ 2. SPECIFY CAPABILITIES - Skills, Tools, Memory, Knowledge. │ │ 3. DESIGN THE PROMPT - System Prompt is the agent's DNA. │ │ 4. SHOW, DON'T TELL - Golden conversations are the spec. │ │ 5. BOUND THE BEHAVIOR - What it CAN'T do matters equally. │ │ 6. EVALUATE CONTINUOUSLY - Define metrics before building. │ │ 7. HUMANS IN THE LOOP - Know when to escalate, always. │ └─────────────────────────────────────────────────────────────────┘
The goal is to architect cognition—define how an intelligent system should think, decide, and act within safe boundaries.