Voice Prompt Builder

Build effective voice prompts for OpenAI's Realtime API. Based on OpenAI's official Realtime Prompting Guide.

General Tips

Iterate relentlessly — small wording changes make or break behavior.
Bullets over paragraphs — clear, short bullets outperform long prose.
Guide with examples — the model closely follows sample phrases.
Be precise — ambiguity or conflicting instructions = degraded performance.
Pin language — explicitly constrain output language to prevent unwanted switching.
Reduce repetition — add a Variety rule to avoid robotic phrasing.
CAPITALIZE key rules — makes them stand out for the model.
Convert non-text rules to text — write "IF MORE THAN THREE FAILURES THEN ESCALATE" not "IF x > 3 THEN ESCALATE".

Prompt Structure

Organize the system prompt into labeled sections. Each section focused on one thing.

# Role & Objective             — who you are and what "success" means
# Personality & Tone           — voice, style, brevity, pacing
# Conversational Liveness      — backchannels, micro-acknowledgements, repair warmth
# Context                      — retrieved context, relevant info
# Reference Pronunciations     — phonetic guides for tricky words
# Tools                        — names, usage rules, preambles
# Instructions / Rules         — do's, don'ts, approach
# Conversation Flow            — states, goals, transitions
# Safety & Escalation          — fallback and handoff logic

Add domain-specific sections as needed (e.g., Compliance, Brand Policy). Remove sections not needed.

For detailed guidance on each section, see: references/prompt-sections.md

Tool Design

Tools in the Realtime API follow OpenAI function calling format. Key patterns:

Behavior Types

Assign each tool a behavior type:

Type	When to use	Model behavior
PROACTIVE	Read-only lookups, low-risk	Call immediately, no confirmation, no preamble
PREAMBLES	Latency-sensitive lookups	Say a short filler phrase, then call immediately
CONFIRMATION FIRST	Write operations, bookings	Ask user before calling

Per-Tool Instructions

For each tool in the prompt, specify:

Use when: specific trigger condition
Do NOT use when: specific exclusion
Preamble sample phrases (in tool description for PREAMBLES type)

Preamble Sample Phrases in Tool Description

Add sample preamble phrases directly in the tool's description field:

{
  "name": "lookup_account",
  "description": "Retrieve account by email or phone.\n\nPreamble sample phrases:\n- Let me look that up for you.\n- One moment, checking your account."
}

For advanced tool patterns (rephrase supervisor, common tools, error handling), see: references/tools-patterns.md

Conversation Flow

Break the interaction into phases with clear goals, instructions, and exit criteria.

Per-State Format

Each state needs:

Goal — one sentence, what "done" means
How to respond — bullet list of actions
Sample phrases — 2-3 examples + "vary your responses"
Exit criteria — concrete condition to leave
Transition — which state to go to next

Two Patterns

Static State Machine — all states in the system prompt as JSON or markdown. Good for simpler flows (4-6 states).
Dynamic via session.update — only current state's instructions + tools loaded. Swap on transition. Better for complex flows (6+ states, many tools per state).

For detailed flow design, state machine patterns, and sample phrases guidance, see: references/conversation-flow.md

Conversational Liveness

Micro-behaviors that make the voice agent feel present and actively listening, rather than robotic.

Key rules:

Backchannels — brief signals ("mhm", "okay", "right") at most once every 2-3 caller utterances; never repeat the same one twice in a row
Micro-acknowledgements — confirm what was heard in one short phrase before moving on: "Got it, [key detail]."
Turn-yield cues — use softeners ("go ahead", "whenever you're ready") when handing the floor back; allow silence after questions
Micro-repair warmth — when misunderstandings occur, use warm phrasing ("Let me make sure I have that right…"); never blame the caller
Never-list — no fillers/backchannels while reading numbers, dates, or codes; no delay cues during emergencies; no casual fillers during legal/official procedures

For detailed guidance, see: references/prompt-sections.md

Safety & Escalation

Always include escalation rules. Define WHEN to escalate:

Safety risk (self-harm, threats, harassment)
User explicitly asks for a human
Severe frustration (repeated complaints, profanity)
2 failed tool attempts on the same task OR 3 consecutive no-match events
Out-of-scope or restricted topics (financial/legal/medical advice)

Define WHAT to say:

"Thanks for your patience — I'm connecting you with a specialist now."

Then call escalate_to_human tool.

Voice Behavior Checklist

Apply these rules to every voice prompt:

Quality Check

After writing a prompt, verify:

No conflicting instructions — rules should not contradict each other
Tools in prompt match tools list — do not mention tools not in the tools array
Tool descriptions do not contradict each other
Every state has exit criteria and a transition target
Sample phrases exist for every state (with "vary your responses" note)
Language is pinned if multilingual switching is undesired
Voice behavior rules are included (see checklist above)

Use the Instructions Quality Prompt from references/prompt-sections.md to have a text model critique your prompt before deploying.

realtime-voice-prompt

Safety Notice

Copy this and send it to your AI assistant to learn