Root Cause Analysis
Systematic problem solving using Fishbone (Ishikawa) diagrams and 5 Whys technique. Identifies true root causes and recommends effective corrective actions.
What is Root Cause Analysis?
Root Cause Analysis (RCA) is a structured approach to identify the fundamental cause(s) of problems, rather than addressing symptoms. Effective RCA prevents problem recurrence.
Approach Focus Best For
Fishbone (Ishikawa) Brainstorming all potential causes by category Complex problems with multiple potential causes
5 Whys Drilling down through cause chains Problems with linear cause-effect relationships
Combined Fishbone to identify, 5 Whys to drill down Comprehensive root cause identification
Framework 1: Fishbone (Ishikawa) Diagram
What is a Fishbone Diagram?
A Fishbone diagram (also called Ishikawa or cause-and-effect diagram) organizes potential causes into categories, resembling a fish skeleton with the problem as the "head."
┌─────────────────────────────────────────────────────────┐
│ │
Man ──┼──┐ ┌──┼── Method │ │ │ │ │ └─────────────────────┐ ┌───────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────┐ │ │ │ PROBLEM │ │ │ └─────────┘ │ │ ▲ ▲ │ │ ┌─────────────────────┘ └───────────────┐ │ │ │ │ │ Machine ┼──┘ └──┼── Material │ │ │ │ Measurement ────────────────────────────────────────────── Mother Nature │ │ └───────────────────────────────────────────────┘
The 6M Categories
Category Also Called Example Causes
Man People, Personnel Training, skills, fatigue, communication
Machine Equipment, Technology Hardware, software, tools, maintenance
Method Process, Procedure Workflow, documentation, standards
Material Inputs, Data Raw materials, components, information quality
Measurement Metrics, Inspection Calibration, accuracy, data collection
Mother Nature Environment Temperature, humidity, regulations, market
Alternative Category Sets
Domain Categories
Manufacturing (6M) Man, Machine, Method, Material, Measurement, Mother Nature
Service (8P) Product, Price, Place, Promotion, People, Process, Physical Evidence, Productivity
Software (5S) Systems, Skills, Suppliers, Surroundings, Safety
Custom Define categories relevant to your domain
Fishbone Workflow
Step 1: Define the Problem
Problem Statement
Problem: [Clear, specific, measurable description] Impact: [Quantified impact - cost, time, quality, safety] When Discovered: [Date/time] Where: [Location, system, process] Frequency: [One-time / Recurring / Intermittent]
Good problem statements:
-
✅ "Customer orders delayed by 3+ days increased 40% in Q4"
-
✅ "Production line 3 defect rate rose from 2% to 8% in November"
-
❌ "Things are slow" (too vague)
-
❌ "The system is broken" (not measurable)
Step 2: Assemble the Team
Include people with direct knowledge:
Role Contribution
Process Owner Overall accountability
Front-line Workers Direct observation and experience
Subject Matter Experts Technical knowledge
Customer Representative Impact perspective
Facilitator Guide the process
Step 3: Brainstorm Causes by Category
For each category, ask: "What in this category could cause the problem?"
Cause Brainstorming
Man (People)
| Potential Cause | Evidence | Likelihood |
|---|---|---|
| Insufficient training | New hires 60% of errors | High |
| Fatigue (overtime hours) | Errors peak after 8 hours | Medium |
| Communication gaps | Handoff errors documented | High |
Machine (Equipment)
| Potential Cause | Evidence | Likelihood |
|---|---|---|
| Aging equipment | Machine 7 has 40% of failures | High |
| Software bugs | Version 2.1 introduced defects | Medium |
Method (Process)
| Potential Cause | Evidence | Likelihood |
|---|---|---|
| Unclear procedures | 3 different approaches observed | High |
| Missing quality check | No verification step documented | High |
Material (Inputs)
| Potential Cause | Evidence | Likelihood |
|---|---|---|
| Supplier quality issues | Incoming defect rate up 15% | Medium |
| Specification ambiguity | Multiple interpretations possible | Low |
Measurement (Metrics)
| Potential Cause | Evidence | Likelihood |
|---|---|---|
| Inaccurate sensors | Calibration overdue | Medium |
| Wrong metrics tracked | Leading indicators missing | Low |
Mother Nature (Environment)
| Potential Cause | Evidence | Likelihood |
|---|---|---|
| Seasonal demand spike | Q4 always highest volume | High |
| Regulatory changes | New requirements in October | Low |
Step 4: Identify Most Likely Root Causes
Prioritize based on:
Criterion Question
Evidence Is there data supporting this cause?
Frequency Does this cause relate to problem frequency?
Correlation Does the cause timing match problem timing?
Control Can we test or eliminate this cause?
Step 5: Validate Root Causes
For each high-likelihood cause:
Root Cause Validation
| Cause | Validation Method | Result | Confirmed? |
|---|---|---|---|
| Insufficient training | Compare error rates by tenure | New hires 3x error rate | ✅ Yes |
| Aging equipment | Analyze failures by machine | Machine 7: 40% of all failures | ✅ Yes |
| Unclear procedures | Review documentation | 3 conflicting SOPs found | ✅ Yes |
Fishbone Output Format
Fishbone Analysis: [Problem]
Date: [ISO date] Facilitator: rca-analyst Team: [Participants]
Problem Statement
[Clear, specific, measurable problem]
Impact
[Quantified business impact]
Cause Analysis
Man (People)
- [Cause 1] - Evidence: [X] - Status: Confirmed/Suspected
- [Cause 2] - Evidence: [X] - Status: Confirmed/Suspected
Machine (Equipment)
- [Cause 1] - Evidence: [X] - Status: Confirmed/Suspected
Method (Process)
- [Cause 1] - Evidence: [X] - Status: Confirmed/Suspected
Material (Inputs)
- [Cause 1] - Evidence: [X] - Status: Confirmed/Suspected
Measurement (Metrics)
- [Cause 1] - Evidence: [X] - Status: Confirmed/Suspected
Mother Nature (Environment)
- [Cause 1] - Evidence: [X] - Status: Confirmed/Suspected
Confirmed Root Causes
| # | Root Cause | Category | Evidence | Contribution |
|---|---|---|---|---|
| 1 | [Description] | [6M] | [Data] | [% of problem] |
Recommended Actions
| # | Action | Addresses | Owner | Due Date |
|---|---|---|---|---|
| 1 | [Corrective action] | Root Cause #1 | [Name] | [Date] |
Fishbone Mermaid Diagram
flowchart LR subgraph Man["👤 Man"] M1[Training gaps] M2[Communication issues] end
subgraph Machine["⚙️ Machine"]
MA1[Aging equipment]
MA2[Software bugs]
end
subgraph Method["📋 Method"]
ME1[Unclear procedures]
ME2[Missing QC step]
end
subgraph Material["📦 Material"]
MT1[Supplier quality]
MT2[Specification issues]
end
subgraph Measurement["📊 Measurement"]
MS1[Sensor accuracy]
MS2[Wrong metrics]
end
subgraph Environment["🌍 Environment"]
E1[Seasonal demand]
E2[Regulatory changes]
end
M1 & M2 --> Problem
MA1 & MA2 --> Problem
ME1 & ME2 --> Problem
MT1 & MT2 --> Problem
MS1 & MS2 --> Problem
E1 & E2 --> Problem
Problem[❌ PROBLEM:<br/>Customer delays<br/>increased 40%]
style Problem fill:#f99,stroke:#900
Framework 2: 5 Whys
What is 5 Whys?
5 Whys is an iterative interrogative technique that drills down from symptoms to root causes by repeatedly asking "Why?" (typically 5 times, but may be fewer or more).
Key Principle: Each answer becomes the subject of the next "Why?"
5 Whys Workflow
Step 1: State the Problem
5 Whys Analysis
Problem: [Specific problem statement] Date: [ISO date] Analyst: 5whys-analyst
Step 2: Ask Why Iteratively
Why Chain
Problem: The website crashed during the product launch.
Why 1: Why did the website crash? → The server ran out of memory.
Why 2: Why did the server run out of memory? → The application had a memory leak.
Why 3: Why did the application have a memory leak? → The new feature wasn't properly tested under load.
Why 4: Why wasn't the new feature tested under load? → We don't have load testing in our CI/CD pipeline.
Why 5: Why don't we have load testing in CI/CD? → It was never prioritized when the pipeline was set up.
Root Cause: Load testing was not included in CI/CD pipeline during initial setup.
Step 3: Validate the Root Cause
Validation Check Question Pass?
Specific Is the root cause specific and actionable? ☐
Systemic Does addressing this prevent recurrence? ☐
Controllable Can we actually fix this? ☐
Chain Valid Does each why-answer logically lead to the next? ☐
5 Whys Best Practices
Do Don't
✅ Focus on process/system causes ❌ Blame individuals
✅ Base answers on facts and evidence ❌ Speculate without data
✅ Keep asking until actionable root found ❌ Stop at symptoms
✅ Document the chain for future reference ❌ Accept "human error" as root cause
✅ Consider multiple parallel chains ❌ Assume single root cause
Multiple 5 Why Chains
Complex problems often have multiple contributing causes. Run parallel 5 Whys chains:
Multiple Why Chains
Problem: Customer complaints increased 50%
Chain A: Response Time
Why 1: Response time increased → Support queue grew Why 2: Queue grew → Fewer agents available Why 3: Fewer agents → High turnover Why 4: High turnover → Poor working conditions Why 5: Poor conditions → No ergonomic equipment Root Cause A: Lack of investment in workplace ergonomics
Chain B: First-Contact Resolution
Why 1: FCR dropped → Agents lack knowledge Why 2: Lack knowledge → Training inadequate Why 3: Training inadequate → Product changes not communicated Why 4: Not communicated → No process for training updates Root Cause B: Missing process for training on product changes
Chain C: (Additional chains as needed)
5 Whys Output Format
5 Whys Analysis: [Problem]
Date: [ISO date] Analyst: 5whys-analyst Problem: [Statement]
Why Chain
| Level | Question | Answer | Evidence |
|---|---|---|---|
| Why 1 | Why [problem]? | [Answer] | [Data] |
| Why 2 | Why [answer 1]? | [Answer] | [Data] |
| Why 3 | Why [answer 2]? | [Answer] | [Data] |
| Why 4 | Why [answer 3]? | [Answer] | [Data] |
| Why 5 | Why [answer 4]? | [Answer] | [Data] |
Root Cause
[Final root cause statement]
Validation
- Specific and actionable
- Systemic (prevents recurrence)
- Within our control
- Each step logically connected
Corrective Action
| Action | Owner | Due | Status |
|---|---|---|---|
| [Action to address root cause] | [Name] | [Date] | Open |
5 Whys Mermaid Diagram
flowchart TD P[❌ Problem:<br/>Website crashed<br/>during launch] W1[Why 1: Server ran<br/>out of memory] W2[Why 2: Application<br/>had memory leak] W3[Why 3: New feature<br/>not load tested] W4[Why 4: No load testing<br/>in CI/CD pipeline] W5[Why 5: Load testing<br/>not prioritized initially] RC[🎯 Root Cause:<br/>Missing load testing<br/>in CI/CD setup]
P --> W1
W1 --> W2
W2 --> W3
W3 --> W4
W4 --> W5
W5 --> RC
style P fill:#f99,stroke:#900
style RC fill:#9f9,stroke:#090
Combined Approach
For comprehensive analysis, combine both techniques:
-
Fishbone to brainstorm all potential causes across categories
-
5 Whys to drill down on each high-likelihood cause
-
Validate multiple root causes found
-
Prioritize corrective actions
Combined RCA: [Problem]
Phase 1: Fishbone Brainstorming
[Identify all potential causes by category]
Phase 2: 5 Whys Drill-Down
For each high-likelihood cause from fishbone:
Cause A: [From fishbone]
[5 Whys chain] Root Cause A: [Result]
Cause B: [From fishbone]
[5 Whys chain] Root Cause B: [Result]
Phase 3: Root Cause Summary
| # | Root Cause | Source | Validated? | Priority |
|---|---|---|---|---|
| 1 | [RC-A] | Fishbone + 5 Whys | Yes | High |
| 2 | [RC-B] | Fishbone + 5 Whys | Yes | Medium |
Phase 4: Corrective Action Plan
[CAPA based on validated root causes]
Corrective Action Planning (CAPA)
Action Types
Type Purpose Example
Immediate/Containment Stop the bleeding Disable faulty feature
Corrective Fix the specific occurrence Patch the bug
Preventive Prevent recurrence Add automated testing
Systemic Address organizational factors Update training program
CAPA Template
Corrective Action Plan
Problem: [Reference to RCA] Root Causes: [List confirmed root causes]
Actions
| # | Action | Type | Root Cause | Owner | Due | Status |
|---|---|---|---|---|---|---|
| 1 | [Description] | Corrective | RC-1 | [Name] | [Date] | Open |
| 2 | [Description] | Preventive | RC-1 | [Name] | [Date] | Open |
| 3 | [Description] | Systemic | RC-2 | [Name] | [Date] | Open |
Verification
| Action | Verification Method | Criteria | Result |
|---|---|---|---|
| 1 | [How to verify] | [Success criteria] | Pending |
Follow-up
- Review Date: [Date]
- Responsible: [Name]
- Metrics to Monitor: [KPIs]
Structured Data (YAML)
root_cause_analysis: version: "1.0" date: "2025-01-15" analyst: "rca-analyst"
problem: statement: "Customer order delays increased 40% in Q4" impact: "$500K in refunds, 15% churn increase" discovered: "2025-01-10" frequency: recurring
fishbone: categories: man: - cause: "Insufficient training" evidence: "New hires 3x error rate" likelihood: high confirmed: true machine: - cause: "Aging equipment on Line 3" evidence: "40% of failures from Machine 7" likelihood: high confirmed: true method: - cause: "Unclear procedures" evidence: "3 conflicting SOPs" likelihood: high confirmed: true material: [] measurement: [] environment: - cause: "Q4 demand spike" evidence: "Volume up 60%" likelihood: high confirmed: true
five_whys: - chain_name: "Training Gap" problem: "High error rate in new hires" whys: - question: "Why high error rate?" answer: "Insufficient training" - question: "Why insufficient training?" answer: "Training program not updated" - question: "Why not updated?" answer: "No owner assigned" - question: "Why no owner?" answer: "Process not established" root_cause: "No process for maintaining training program"
root_causes: - id: "RC-1" description: "No process for training program maintenance" source: "Fishbone + 5 Whys" validated: true contribution: 40
capa: - action: "Assign training program owner" type: corrective root_cause: "RC-1" owner: "HR Director" due_date: "2025-02-01" status: open
When to Use RCA
Scenario Use RCA? Approach
Major incident Yes Full Fishbone + 5 Whys
Recurring defect Yes Focused 5 Whys
Customer complaint pattern Yes Combined approach
Minor one-time issue Maybe Quick 5 Whys
Proactive improvement Partial Fishbone for ideation
Integration
Upstream
-
Incident reports - Trigger for RCA
-
Quality metrics - Identify problems
-
stakeholder-analysis - Identify team members
Downstream
-
Process improvement - Implement corrective actions
-
Training programs - Address people-related causes
-
value-stream-mapping - Eliminate systemic waste
Related Skills
-
decision-analysis
-
Evaluate corrective action options
-
risk-analysis
-
Assess risk of root causes
-
value-stream-mapping
-
Process improvement
-
stakeholder-analysis
-
Identify RCA participants
User-Facing Interface
When invoked directly by the user, this skill operates as follows.
Arguments
-
<problem-statement> : Description of the problem to analyze
-
--mode : Analysis technique (default: full )
-
fishbone : Fishbone diagram with 6M categories (~4K tokens)
-
5whys : 5 Whys drill-down technique (~3K tokens)
-
full : Both techniques combined (~8K tokens)
-
--output : Output format (default: both )
-
yaml : Structured YAML for downstream processing
-
mermaid : Mermaid diagram visualization
-
both : Both formats
-
--dir : Output directory (default: docs/analysis/ )
Execution Workflow
-
Parse Arguments - Extract problem statement, mode, and output format. If no problem provided, ask the user what problem to analyze.
-
Define the Problem - Clarify problem statement, impact, frequency, first observed date, current vs desired state, and available evidence.
-
Execute Based on Mode:
-
Fishbone: Explore causes across all 6M categories (Man, Machine, Method, Material, Measurement, Mother Nature), rate likelihood, and identify relationships.
-
5 Whys: Progressive questioning to drill down to root cause. May require fewer or more than 5 whys.
-
Full: Spawn the problem-solver agent for combined Fishbone + 5 Whys analysis with CAPA plan.
-
Consolidate Root Causes - Summarize findings with evidence and confidence ratings.
-
Develop CAPA Plan - Create immediate corrections, corrective actions (address root cause), preventive actions (prevent recurrence), and verification plan.
-
Generate Output - Produce YAML structure, Mermaid fishbone/flowchart diagrams, and summary report.
-
Save Results - Save to docs/analysis/root-cause-analysis.yaml and/or docs/analysis/root-cause-analysis.md (or custom --dir ).
Version History
- v1.0.0 (2025-12-26): Initial release