Debugging Skill

This skill defines how to systematically debug an issue reported by a developer. It follows a two-phase approach: first Intake (gathering all necessary context), then Diagnosis (structured analysis and actionable output).

Phase 1 — Intake

Before performing any diagnosis, collect the following inputs from the developer. Ask for all of them in a single, structured prompt. Mark optional fields clearly.

Required Inputs

Field Required Notes

1 Problem Description ✅ Yes What went wrong? What was expected vs. actual behaviour?

2 Error Message / Stack Trace ⚡ Highly Recommended Paste the full error. If none, describe what is observed.

3 Steps to Reproduce ⚡ Highly Recommended Exact steps to trigger the issue. Note if it is consistent or intermittent.

4 Environment Context ⚡ Highly Recommended Environment (dev/staging/prod), OS, runtime version, recent deployments or changes

5 Attachments 🔵 Optional Screenshots, logs, network traces, or config files

6 Recent Changes 🔵 Optional Any recent code, config, infra, or data changes that preceded the issue

7 Affected Users / Scope 🔵 Optional All users? A specific role, region, or data set?

Intake Prompt Template

Use this template when asking the developer for inputs:

I'm ready to help debug. Please provide the following:

📝 Problem Description (required) What went wrong? What did you expect vs. what actually happened?
🔴 Error Message / Stack Trace (highly recommended) Paste the full error output. If no error is shown, describe what you observe instead.
🔁 Steps to Reproduce (highly recommended) Walk me through how to trigger this issue. Is it consistent or intermittent?
🌐 Environment Context (highly recommended)
- Environment: dev / staging / prod
- OS / Runtime version (e.g., Node 20, Python 3.11, Java 17)
- Any recent deployments, config changes, or dependency updates?
📎 Attachments (optional) Screenshots, log files, network traces, or relevant configs.
🔧 Recent Changes (optional) Any recent code, database, infra, or data changes before this issue appeared?
👥 Affected Scope (optional) Who is affected? All users, specific roles, regions, or certain data?

Problem Description Validation Gate

This gate MUST be evaluated before Phase 2 begins. Do not proceed to Diagnosis if the Problem Description fails validation.

What Makes a Valid Problem Description

A valid Problem Description must satisfy all three of the following criteria:

Criterion Why It Matters

1 Describes a specific observable symptom or failure Vague inputs like "it's broken" or "it doesn't work" provide zero signal for diagnosis

2 Contains at least one concrete detail — a feature name, an action taken, a screen/page, an API endpoint, a value, or a behaviour Without a concrete anchor, there is nothing in the codebase to investigate

3 Is coherent and in good faith — not gibberish, placeholder text, or clearly random input Random text wastes investigation effort and produces meaningless results

Examples

Input Valid? Reason

"The login button does nothing when I click it on the /auth/login page"

✅ Yes Specific symptom + concrete detail (button, page)

"Order creation fails with a 500 error after I added the discount field"

✅ Yes Symptom + context (500, feature name, recent change)

"Something is wrong with the app"

❌ No No specific symptom, no concrete detail

"It's broken"

❌ No Completely vague

"asdfghjkl"

❌ No Gibberish / not in good faith

"help"

❌ No Not a problem description

"I don't know, just debug it"

❌ No No actionable information

"The payment module has an issue"

❌ No No symptom described; too vague to investigate

⚠️ A problem description that identifies a module or file alone (e.g., "the payment module") is not sufficient. It must describe what the module is doing wrong.

Validation Flow

Developer submits intake response │ ▼ Is Problem Description present? │ │ No Yes │ │ ▼ ▼ Re-prompt Does it pass all 3 validity criteria? (Attempt 1) │ │ Yes No │ │ ▼ ▼ Proceed to Re-prompt with Phase 2 specific feedback (Attempt 1 → max 2) │ After 2 failed attempts │ ▼ Halt and escalate

Re-Prompt Rules

Maximum re-prompts: 2. If the Problem Description is still invalid after 2 correction attempts, halt the session and send the Escalation Message.
Each re-prompt must be specific — tell the developer exactly which criterion their input failed and give a concrete example of what a better input looks like.
Do not soften or skip the gate — even if the developer insists, a valid Problem Description is a prerequisite for meaningful diagnosis.

Re-Prompt Template

Use this when the Problem Description is invalid:

⚠️ I wasn't able to start the debug session yet because the Problem Description needs a bit more detail.

What I received: "<their input>"

What's missing: <state which criterion failed — e.g., "no specific symptom described" / "no concrete detail" / "input is not a problem description">

What a good Problem Description looks like:

"The checkout button on /cart throws a 500 error after I enter a promo code." "User profile pictures are not loading for accounts created before 2025-01-01."

Please re-describe the problem with at least one specific symptom and one concrete detail (feature, page, endpoint, value, or behaviour). The other fields (error message, steps to reproduce, etc.) remain unchanged — just update the Problem Description.

Attempt [1 / 2] remaining.

Escalation Message (after 2 failed attempts)

🚫 Debug session could not be started.

After 2 attempts, the Problem Description provided does not contain enough information to proceed with a meaningful diagnosis. Starting an investigation without a clear problem statement risks producing inaccurate or misleading results.

Next steps:

Gather more context about what is failing — check logs, error messages, or ask a colleague who observed the issue.
Restart the debug session once you have a clearer description of the symptom.

If you believe this is a false rejection, please provide the full error message or a screenshot — that alone may be sufficient to establish a valid problem context.

Phase 2 — Diagnosis

Once intake is complete, perform a structured investigation and produce the following outputs.

Step 1 — Understand the Bug Behaviour

Classify the bug before diving in:

Dimension Options

Reproducibility Consistent / Intermittent / Unknown

Severity Critical (data loss, outage) / High (feature broken) / Medium (degraded) / Low (cosmetic)

Blast Radius All users / Subset of users / Single user / No user impact (internal/silent)

Bug Type Logic error / Runtime exception / Configuration / Race condition / Data / Integration / Environment

Step 2 — Investigate

Perform the following steps using available tools (code search, file reading, log analysis):

Read the error message and stack trace — identify the exact file, line, and function where the error originates
Trace the call chain leading to the failure point (use the code-exploration skill's L4 Function level if needed)
Search for recent changes in the impacted files (git log , diff review)
Check for environmental factors: config values, missing env vars, dependency version mismatches
Check data assumptions: is there any data that the code assumes exists, is non-null, or is in a specific format?
Identify all other files or services that call into or are called by the failing code — assess the blast radius

Step 3 — Root Cause Analysis

List all plausible root causes with a likelihood score and supporting evidence.

Format

🔎 Root Cause Analysis

#	Suspected Root Cause	Likelihood	Evidence / Reasoning
1	<description of cause>	🔴 High (75%)	<what points to this>
2	<description of cause>	🟡 Medium (20%)	<what points to this>
3	<description of cause>	🟢 Low (5%)	<what points to this>

Most Likely Root Cause: #1 — <brief summary>

Likelihood Legend

Indicator Score Range Meaning

🔴 High 60–100% Strong evidence; the stack trace or code directly supports this

🟡 Medium 30–59% Plausible; partial evidence or requires validation

🟢 Low 1–29% Possible edge case; cannot be ruled out without more data

⚠️ If total likelihood does not add up to 100%, normalise across all candidates. Always rank by descending likelihood. If only one cause is identified, state "single root cause identified" and assign 100%.

Step 4 — Impacted File Map

List every file that is directly or indirectly affected by the bug or the fix.

Format

📁 Impacted Files

File	Role	Impact Type
`src/orders/orders.service.ts`	Origin of error	🔴 Direct — fix required here
`src/users/users.service.ts`	Called by failing code	🟡 Indirect — may need defensive update
`src/common/filters/http-exception.filter.ts`	Error handler	🔵 Review — ensure error surfaces correctly
`tests/orders/orders.service.spec.ts`	Unit tests	✅ Update — add test case for this scenario

Impact Type Legend

Icon Type Meaning

🔴 Direct Fix required This file contains the defective code

🟡 Indirect May need update Depends on or is called by the defective code

🔵 Review Check only Peripheral; review to confirm it handles this case

✅ Update Test/doc update No logic change, but tests or docs need updating

Step 5 — Proposed Solutions

Provide at least 2 distinct solutions. Solutions must differ meaningfully (not just syntax variants).

Format for Each Solution

Solution [N] — <Short Title>

Approach: <Describe the approach in 2–4 sentences. What changes, and why does this fix the issue?>

Trade-offs:

Pro	Con
<benefit>	<drawback>

Key Changes:

<file>: <what changes>
<file>: <what changes>

How to Validate: <Step-by-step instructions to confirm the fix works — unit test to write, curl command to run, UI action to perform, etc.>

Estimated Effort: <Low / Medium / High>

Step 6 — Recommendation

State the recommended solution clearly and explain why it is preferred.

Format

✅ Recommended Solution: Solution [N] — <Title>

Why this is recommended: <2–4 sentences justifying the choice. Reference trade-offs, risk, effort, and alignment with existing patterns.>

Implementation Order:

<First action to take>
<Second action to take>
<Verification step>

Step 7 — Preventive Recommendation

After diagnosing and recommending a fix, always suggest how to prevent this class of bug from recurring.

Format

🛡️ Prevention

Recommendation	Type	Priority
Add null check for `user.id` before DB query	Code Guard	🔴 High
Write unit test for empty payload edge case	Test Coverage	🟡 Medium
Add schema validation at API boundary	Input Validation	🟡 Medium
Set up alerting for 5xx error rate spike	Observability	🔵 Low

Step 8 — Investigation Trail

Maintain a log of what was checked and what was ruled out. This is critical if the debugging session spans multiple iterations or is handed off to another developer.

Format

🧭 Investigation Trail

Step	What Was Checked	Finding	Status
1	Stack trace origin	Error thrown in `orders.service.ts` line 47	✅ Confirmed
2	Recent git changes to `orders.service.ts`	No changes in last 30 days	🚫 Ruled out
3	ENV variable `DATABASE_URL`	Present and correct in all environments	🚫 Ruled out
4	Null check on `user` object before `.id` access	Missing null guard — matches error pattern	✅ Confirmed
5	Upstream caller `orders.controller.ts`	Passes unvalidated user object	🟡 Contributing factor

Full Output Template

Use this skeleton as the final output structure for every debugging session:

🐛 Debug Report — <Short Problem Title>

Reported: YYYY-MM-DD Environment: <dev/staging/prod> Severity: <Critical / High / Medium / Low> Status: <Investigating / Root Cause Identified / Solution Proposed / Resolved>

📋 Problem Summary

<1–3 sentence summary of the issue, expected vs. actual behaviour, and reproducibility>

🔎 Bug Behaviour

Reproducibility: <Consistent / Intermittent>
Blast Radius: <Who/what is affected>
Bug Type: <Logic / Runtime / Config / Race Condition / Data / Integration / Environment>

🔎 Root Cause Analysis

📁 Impacted Files

💡 Proposed Solutions

Solution 1 — <Title>

Solution 2 — <Title>

✅ Recommended Solution

🛡️ Prevention

🧭 Investigation Trail

File Output (MANDATORY)

Every debug report must be saved as a markdown file before presenting it to the user.

Save Location

<repo-root>/.agent/debugs/

Naming Convention

debug_<YYYYMMDD><HHMMSS><short-slug>.md

Examples:

.agent/debugs/debug_20260311_143022_null-user-id-orders.md .agent/debugs/debug_20260311_160045_auth-token-expiry.md

File Header

Every debug file must start with a metadata header:

Debug Report — <Short Problem Title>

Level: Debug Session Reported: YYYY-MM-DD HH:MM:SS Environment: <environment> Severity: <severity> Status: <status>

Quick Decision Guide

Developer reports a bug? → Run Phase 1 Intake (collect all inputs in one prompt)

Intake complete? → Run Phase 2 Diagnosis (Steps 1–8 in order)

Only one root cause found? → Still provide ≥2 solutions (different approaches to fix the same cause)

Bug is intermittent / no stack trace? → Increase weight on environment, race conditions, and data assumptions

Bug appeared "out of nowhere" with no code changes? → Prioritise: infra/config changes, data anomalies, dependency updates

Fix applied but issue persists? → Revisit investigation trail, promote lower-likelihood root causes, add new findings

Session handed off to another developer? → Ensure Investigation Trail (Step 8) is fully up to date before handoff

debugging

Safety Notice

Copy this and send it to your AI assistant to learn

🔎 Root Cause Analysis

📁 Impacted Files

Solution [N] — <Short Title>

✅ Recommended Solution: Solution [N] — <Title>

🛡️ Prevention

🧭 Investigation Trail

🐛 Debug Report — <Short Problem Title>

📋 Problem Summary

🔎 Bug Behaviour

🔎 Root Cause Analysis

📁 Impacted Files

💡 Proposed Solutions

Solution 1 — <Title>

Solution 2 — <Title>

✅ Recommended Solution

🛡️ Prevention

🧭 Investigation Trail

Debug Report — <Short Problem Title>

Source Transparency

Related Skills

sdapp-commit

code-review-expert

unit-testing