prompt-debugger

Debug prompts that produce unexpected AI outputs — diagnose failure modes, identify ambiguity and conflicting instructions, test variations, compare model responses, and iteratively improve prompt quality.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "prompt-debugger" with this command: npx skills add charlie-morrison/prompt-debugger

Prompt Debugger

When a prompt isn't working as expected, systematically diagnose why and fix it. Identifies common failure patterns (ambiguity, conflicting instructions, missing context, wrong format specification), tests variations, and produces an improved version.

Use when: "why isn't this prompt working", "debug my prompt", "improve this prompt", "the AI keeps doing X instead of Y", "prompt not producing expected output", "prompt optimization", or iterating on system prompts.

Commands

1. diagnose — Analyze a Failing Prompt

Given a prompt and its undesired output, identify the root cause.

Step 1: Structural Analysis

Read the prompt and check for common failure patterns:

Ambiguity Checks:

  • Vague instructions ("make it better", "be more specific", "improve this")
  • Missing output format specification
  • Unclear scope ("analyze this" — analyze what aspect?)
  • Pronoun confusion ("it", "this", "that" without clear referent)
  • Multiple possible interpretations of key terms

Conflict Checks:

  • Contradictory instructions ("be concise" + "explain in detail")
  • Competing priorities without ranking ("be accurate AND fast AND creative")
  • Format conflicts (asking for both structured and freeform output)
  • Tone conflicts ("be professional" + "be casual and fun")
  • Length conflicts (word limits vs. comprehensive coverage)

Context Checks:

  • Missing role/persona specification
  • No examples of desired output
  • Assumed knowledge not stated
  • Missing constraints (length, format, audience, tone)
  • No success criteria ("how would I know if the output is good?")

Instruction Clarity:

  • Nested conditionals that are hard to follow
  • Too many instructions competing for attention
  • Critical instructions buried in the middle
  • Instructions that depend on prior instructions but aren't ordered
  • Implicit assumptions that should be explicit

Step 2: Failure Mode Classification

Categorize the issue:

Failure ModeSymptomsCommon Fix
Instruction FollowingIgnores specific requirementsMove to top, bold, repeat
Format ViolationWrong output structureAdd explicit format example
HallucinationMakes up factsAdd "only use provided info"
Scope CreepAnswers more than askedAdd "only address X, nothing else"
Scope DeficitAnswers less than askedBreak into numbered sub-questions
Tone MismatchWrong voice/registerProvide tone examples
OverthinkingToo verbose/philosophicalAdd "be direct, no preamble"
UnderthinkingToo shallow/genericAdd "think step by step" + require specifics
Context WindowLoses early instructionsRepeat key constraints at end

Step 3: Generate Fix Hypotheses

For each identified issue, propose specific prompt edits:

Issue 1: Ambiguous instruction "analyze the data"
  → Fix: "Analyze the data by calculating the mean, median, and standard deviation for each column. Report any outliers (>2 standard deviations from mean)."

Issue 2: Missing output format
  → Fix: Add "Output format: JSON with keys {summary, findings, recommendations}"

Issue 3: Conflicting constraints
  → Fix: "Prioritize accuracy over brevity. If you must choose between being complete and being concise, be complete."

2. compare — A/B Test Prompt Variations

Generate 3-5 variations of a prompt, each targeting a different failure mode fix.

## Variation A: Original (baseline)
[original prompt]
Expected improvement: none (baseline for comparison)

## Variation B: Explicit format
[prompt + format specification]
Target fix: format violation

## Variation C: Role + examples
[prompt + persona + 2 examples]
Target fix: tone mismatch, underthinking

## Variation D: Constraints tightened
[prompt + explicit constraints + negative examples]
Target fix: scope creep, hallucination

## Variation E: Restructured
[reordered prompt with critical instructions first/last]
Target fix: instruction following

For each variation, explain what was changed and why.

3. rewrite — Produce an Improved Prompt

Apply all identified fixes to produce a single improved prompt.

Rewrite principles:

  1. Critical instructions go first AND last (primacy + recency effects)
  2. One instruction per line/bullet (no compound sentences)
  3. Include 1-2 examples of desired output
  4. Specify what NOT to do (negative examples) for common failure modes
  5. Define success criteria explicitly
  6. Use markdown formatting for structure (headers, bullets, bold for emphasis)
  7. Add explicit output format specification

Before/After format:

### Before
[original prompt — highlight problematic areas]

### After
[improved prompt — annotate what changed and why]

### Changes Made
1. Added role specification ("You are a senior data analyst...")
2. Replaced "analyze" with specific analytical steps
3. Added output format (JSON schema)
4. Moved length constraint to the end (recency)
5. Added negative example ("Do NOT include...")

4. patterns — Common Prompt Patterns Library

Reference of proven prompt patterns for common tasks:

Chain of Thought:

Think through this step by step:
1. First, identify...
2. Then, analyze...
3. Finally, recommend...
Show your reasoning for each step.

Few-Shot:

Here are examples of the expected output:

Input: [example 1 input]
Output: [example 1 output]

Input: [example 2 input]
Output: [example 2 output]

Now process:
Input: [actual input]
Output:

Constraint Sandwich:

[CRITICAL CONSTRAINTS — read first]
[Main task instructions]
[CRITICAL CONSTRAINTS — repeated for emphasis]

Persona + Task + Format:

You are [specific role] with [specific expertise].
Your task is to [specific action] for [specific audience].
Output as [specific format] with [specific requirements].

Self-Verification:

After generating your response, verify:
- Does it address all N requirements?
- Is it under X words?
- Does it follow the specified format?
If not, revise before outputting.

5. score — Rate Prompt Quality

Score a prompt on multiple dimensions (0-10 each):

DimensionScoreAssessment
Clarity7/10Instructions are clear but "analyze" is ambiguous
Specificity4/10Missing format, length, audience
Completeness6/10Has context but no examples
Consistency8/10No conflicting instructions
Testability3/10No success criteria defined
Overall5.6/10Needs format spec and examples

Provide the top 3 improvements that would most increase the score.

6. anti-patterns — Detect Common Prompt Anti-Patterns

Scan a prompt for known problematic patterns:

  • Hedge language: "Try to", "if possible", "maybe", "perhaps" (weakens instructions)
  • Overloading: More than 7 distinct instructions (cognitive load)
  • Vague quantifiers: "some", "several", "a few", "many" (replace with numbers)
  • Double negatives: "don't not include" → "include"
  • Passive voice instructions: "the data should be analyzed" → "analyze the data"
  • Escape hatches: "unless you think otherwise" (invites non-compliance)
  • Meta-instructions: Spending tokens on "you are an AI" preamble
  • Repeat-after-me: Asking the AI to confirm instructions (wastes tokens)

Output Formats

  • text (default): Diagnostic report with annotated prompt
  • json: {diagnosis: {issues: [], failure_modes: [], fixes: []}, rewrite: "", score: {}, anti_patterns: []}
  • markdown: Report suitable for documentation or sharing

Notes

  • Works with any LLM prompt (system prompts, user prompts, agent instructions, SKILL.md files)
  • Does not execute prompts — analyzes structure and content statically
  • Failure mode classification is based on common patterns, not guaranteed causes
  • For best results, provide both the prompt AND an example of the undesired output
  • The rewrite is a starting point — always test with your specific model and use case
  • Different models respond differently to the same prompt — fixes may need model-specific tuning

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Alibabacloud Cfw Exposure Detection

Query and analyze Alibaba Cloud public network exposure, identify unnecessary exposed assets and ports, assess exposure risks, and generate remediation recom...

Registry SourceRecently Updated
General

数据库健康监控

数据库健康监控,支持健康检查、异常检测、容量预测、高级容量预测、趋势分析、基线对比。 智能数据源选择: - Oracle 数据库自动使用 Zabbix 监控 - MySQL 数据库优先使用直连,其次使用 Prometheus - 支持 Z 系列资产组(如 Z18, Z5)自动识别 使用场景: - 用户说"检查健康...

Registry SourceRecently Updated
General

fridge-keeper

冰箱食物管理助手。用于记录食物的入库与出库、追踪保质期、提醒临期食品、提供烹饪建议。当用户说"我的冰箱里有什么"、"检查过期食品"、"推荐菜谱"、"记录入库"或"记录出库"时触发此技能。

Registry SourceRecently Updated
General

数据库实例巡检与报告生成

数据库实例巡检与报告生成,支持配置检查、性能检查、安全检查、报告生成、智能巡检、异常检测、根因分析、风险预测。 使用场景: - 用户说"巡检" → 执行 run - 用户说"生成报告" → 执行 report - 用户说"检查配置" → 执行 run --type configuration - 用户说"建立基线...

Registry SourceRecently Updated