Workflow Design
Guides the user through designing graph-based workflows for AI agents. Based on "Principles of Building AI Agents" (Bhagwat & Gienow, 2025), Part IV: Graph-Based Workflows (Chapters 12-16).
When to use
Use this skill when the user needs to:
-
Break down a complex agent task into structured workflow steps
-
Design branching, chaining, and merging logic
-
Plan human-in-the-loop suspend/resume points
-
Set up streaming for real-time progress updates
-
Design observability and tracing for workflows
Instructions
Step 1: Understand the Process
Use the AskUserQuestion tool to gather context:
-
What is the end-to-end process? (describe the full flow)
-
Is the agent too unpredictable for the task? (if yes, workflows add structure)
-
Are there steps that must happen in a specific order?
-
Are there steps that can run in parallel?
-
Are there points where human input is needed?
-
Does the user need real-time progress updates?
Read any existing spec documents before proceeding.
Key principle: Use workflows when agents are too unpredictable. Workflows define explicit branching, parallel execution, checkpoints, and tracing.
Step 2: Workflow Primitives
Teach the four workflow primitives and map the user's process to them:
Workflow Primitives
1. Chaining (Sequential)
Steps run one after another. Each step has access to the previous step's output.
- Use when: step B depends on step A's result
- Example: Extract data → Validate → Transform → Store
2. Branching (Parallel)
Multiple steps run simultaneously on the same input.
- Use when: independent analyses of the same data
- Example: Analyze sentiment + Extract entities + Classify topic (all in parallel)
3. Merging (Convergence)
Combine results from multiple branches into a single output.
- Use when: parallel branches need to produce a unified result
- Example: Combine sentiment + entities + topic into a single report
4. Conditions (Decision Points)
Route to different steps based on intermediate results.
- Use when: different inputs require different processing paths
- Example: If user intent = "complaint" → escalation flow; else → standard flow
Step 3: Design the Workflow Graph
Walk through the process step by step. Use AskUserQuestion to confirm each step.
Design rules:
-
Each step does ONE thing (no more than one LLM call per step)
-
Input/output at each step should be meaningful and inspectable
-
Name steps clearly — they appear in traces
Output a Mermaid diagram:
graph TD Start([User Input]) --> Extract[Extract Intent] Extract --> Condition{Intent Type?} Condition -->|question| Search[Search Knowledge Base] Condition -->|action| Execute[Execute Action] Condition -->|complaint| Escalate[Escalate to Human] Search --> Generate[Generate Response] Execute --> Generate Escalate --> Suspend([Suspend: Await Human]) Suspend --> Resume[Resume with Human Input] Resume --> Generate Generate --> Validate[Validate Output] Validate --> Respond([Send Response])
And a step table:
Workflow Steps
| # | Step | Type | LLM Call | Input | Output | Notes |
|---|---|---|---|---|---|---|
| 1 | Extract Intent | Chain | Yes (classification) | User message | intent: string | Zero-shot classification |
| 2 | Route | Condition | No | intent | branch selection | Deterministic routing |
| 3a | Search KB | Chain | No (tool call) | query from intent | documents[] | RAG retrieval |
| 3b | Execute Action | Chain | Yes (tool use) | action from intent | result | Agent with tools |
| 3c | Escalate | Suspend | No | complaint details | human input | Wait for human |
| 4 | Generate Response | Chain | Yes (generation) | context + data | response text | Few-shot prompted |
| 5 | Validate | Chain | Yes (judge) | response | pass/fail | Output guardrail |
Step 4: Suspend/Resume Points
Identify where the workflow needs to pause for external input:
Suspend/Resume Points
| # | Trigger | What to Persist | Resume Signal | Timeout |
|---|---|---|---|---|
| 1 | Human approval needed | Full workflow state + pending action | Human clicks approve/reject | 24h |
| 2 | External API callback | Request ID + workflow state | Webhook from external service | 1h |
| 3 | User clarification | Conversation history + ambiguous input | User responds | 30min |
Persistence Strategy
- Storage: [Database / Redis / Durable execution engine]
- Serialization: JSON-serializable workflow state
- Cleanup: Expire suspended workflows after [timeout]
Key Principle
Do NOT keep running processes for long waits. Persist state, shut down, resume when the signal arrives.
Use AskUserQuestion to identify suspension points in the user's workflow.
Step 5: Streaming Strategy
Design how progress flows to the user:
Streaming Strategy
What to Stream
| Event Type | Content | When |
|---|---|---|
| Step start | Step name + description | Each step begins |
| LLM tokens | Token-by-token response | During generation |
| Tool call | Tool name + status | Tool execution |
| Progress | Percentage or step count | Between steps |
| Custom data | Partial results, previews | When available |
Implementation
- Protocol: SSE (Server-Sent Events) / WebSocket
- Frontend: Show step-by-step progress, auto-scroll, display tool calls
- Escape hatches: Push partial results even if the function is not done
UX Principle
Users want to see progress, not a blank screen. Streaming makes agents feel faster and more reliable. Show what is happening at every moment.
Step 6: Observability & Tracing
Design what to observe:
Observability
Tracing Standard
- Format: OpenTelemetry (OTel) — industry standard
- Structure: Traces → Spans (tree of nested operations, like a flame chart)
What to Trace
| Span | Attributes | Purpose |
|---|---|---|
| Workflow run | workflow_id, user_id, start_time, status | Top-level trace |
| Each step | step_name, duration, status, input_tokens, output_tokens | Step-level detail |
| LLM call | model, prompt_tokens, completion_tokens, latency | Cost and performance |
| Tool call | tool_name, input, output, duration, status | Tool reliability |
| Guardrail | guard_name, triggered, action_taken | Security monitoring |
Dashboards
- Per-run view: See every step, its duration, input/output (JSON inspector)
- Aggregate view: Success rate, avg latency, cost per run, error rate
- Eval view: Score per run, score over time, regression detection
Tooling
| Tool | Purpose |
|---|---|
| [LangSmith / Braintrust / custom] | Trace viewer + eval dashboard |
| [Grafana / Datadog] | Infrastructure metrics |
| [PagerDuty / OpsGenie] | Alerting on failure spikes |
Step 7: Workflow Composition
If the agent system has multiple workflows, design how they compose:
Workflow Composition
Workflows as Tools
Complex tasks become workflows, workflows become tools for agents.
- Agent decides WHICH workflow to run
- Workflow ensures HOW the task executes (structured, reliable)
Agents as Workflow Steps
Agent calls can be individual steps in a larger workflow.
- Workflow orchestrates the sequence
- Agent handles the unstructured reasoning within a step
| Workflow | Used As | Called By |
|---|---|---|
| [Research Workflow] | Tool | [Coordinator Agent] |
| [Code Review Workflow] | Step in Deploy Pipeline | [CI/CD Workflow] |
Step 8: Summarize and Offer Next Steps
Present all findings to the user as a structured summary in the conversation (including the Mermaid diagram). Do NOT write to .specs/ — this skill works directly.
Use AskUserQuestion to offer:
-
Implement workflow — scaffold workflow code based on the graph designed above
-
Add observability — set up OpenTelemetry tracing in existing code
-
Comprehensive design — run agent:design to cover all areas with a spec
Arguments
- <args>
- Optional description of the process or path to existing workflow code
Examples:
-
agent:workflow order-processing pipeline — design workflow for order processing
-
agent:workflow src/workflows/ — review existing workflow implementations
-
agent:workflow — start fresh