Prompt Engineering (Deep Workflow)

Prompts behave like natural-language programs: they need specs, tests, and version control—especially in production.

When to Offer This Workflow

Trigger conditions:

Prompt or system message change; quality regressions
Structured outputs (JSON), tool use, or RAG grounding requirements
Safety or policy alignment needs

Initial offer:

Use six stages: (1) define task & success, (2) constraints & format, (3) few-shot & style, (4) build eval set, (5) iterate with discipline, (6) ship, monitor, regress). Confirm model family and latency budget.

Stage 1: Define Task & Success

Goal: Clear user-visible outcome and failure modes (hallucination, omission, tone).

Exit condition: Success rubric in plain language; out-of-scope cases listed.

Stage 2: Constraints & Format

Goal: Must/must-not rules; output schema (JSON Schema, bullet structure); length limits.

Practices

Separate system (policy, role) from user (task instance)
Ask model to cite sources when grounding matters

Stage 3: Few-Shot & Style

Goal: Use examples only when they reduce ambiguity—avoid huge prompt bloat.

Practices

Diverse examples; avoid overlong negative examples that confuse

Stage 4: Build Eval Set

Goal: Frozen inputs with expected properties (not always exact text match).

Practices

Adversarial and multilingual slices if relevant
Regression suite in CI for critical prompts

Stage 5: Iterate With Discipline

Goal: Change one major variable at a time when debugging quality.

Practices

Compare with same temperature settings when A/B testing wording
Log prompt version id with outputs in production

Stage 6: Ship, Monitor, Regress

Goal: Canary prompt changes; watch implicit signals (thumbs, edits, task completion).

Final Review Checklist

Task and rubric defined
Constraints and output format explicit
Eval set versioned; regression path exists
Iteration log disciplined; prompt versions tracked
Production monitoring and rollback plan

Tips for Effective Guidance

Clarity beats cleverness—short explicit instructions often win.
Chain-of-thought: use when reasoning helps; hide chain from end users if needed.
Align with llm-evaluation skill for larger harness design.

Handling Deviations

Chat vs batch: batch can use stricter structure and lower temperature.
Multimodal: specify how image details may be used or ignored.

prompts

Safety Notice

Copy this and send it to your AI assistant to learn

Prompt Engineering (Deep Workflow)

When to Offer This Workflow

Stage 1: Define Task & Success

Stage 2: Constraints & Format

Practices

Stage 3: Few-Shot & Style

Practices

Stage 4: Build Eval Set

Practices

Stage 5: Iterate With Discipline

Practices

Stage 6: Ship, Monitor, Regress

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

Source Transparency

Related Skills

Canonry Setup

Pilot Service Agents Entertainment

Pilot Service Agents Economics

Pilot Service Agents Flights