First-Time Setup

If VoiceMode isn't working or MCP fails to connect, run:

/voicemode:install

After install, reconnect MCP: /mcp → select voicemode → "Reconnect" (or restart Claude Code).

VoiceMode

Natural voice conversations with Claude Code using speech-to-text (STT) and text-to-speech (TTS).

Note: The Python package is voice-mode (hyphen), but the CLI command is voicemode (no hyphen).

When to Use MCP vs CLI

Task Use Why

Voice conversations MCP voicemode:converse

Faster - server already running

Service start/stop MCP voicemode:service

Works within Claude Code

Installation CLI voice-mode-install

One-time setup

Configuration CLI voicemode config

Edit settings directly

Diagnostics CLI voicemode diag

Administrative tasks

Usage

Use the converse MCP tool to speak to users and hear their responses:

Speak and listen for response (most common usage)

voicemode:converse("Hello! What would you like to work on?")

Speak without waiting (for narration while working)

voicemode:converse("Searching the codebase now...", wait_for_response=False)

For most conversations, just pass your message - defaults handle everything else. Use default converse tool parameters unless there's a good reason not to. Timing parameters (listen_duration_max , listen_duration_min ) use smart defaults with silence detection - don't override unless the user requests it or you see a clear need. Defaults are configurable by the user via ~/.voicemode/voicemode.env .

Parameter Default Description

message

required Text to speak

wait_for_response

true Listen after speaking

voice

auto TTS voice

For all parameters, see Converse Parameters.

Best Practices

Narrate without waiting - Use wait_for_response=False when announcing actions
One question at a time - Don't bundle multiple questions in voice mode
Check status first - Verify services are running before starting conversations
Let VoiceMode auto-select - Don't hardcode providers unless user has preference
First run is slow - Model downloads happen on first start (2-5 min), then instant

Parallel Tool Calls (Zero Dead Air)

When performing actions during a voice conversation, use parallel tool calls to eliminate dead air. Send the voice message and the action in the same turn so they execute concurrently.

Pattern: Speak + Act in Parallel

FAST: One turn — voice and action fire simultaneously

Turn 1: speak (fire-and-forget) + do the work (all parallel)

voicemode:converse("Checking that now.", wait_for_response=False) bash("git status") Agent(prompt="Research X", run_in_background=True)

Turn 2: speak the results (with listening)

voicemode:converse("Here's what I found: ...", wait_for_response=True)

SLOW: Two turns — unnecessary sequential delay

Turn 1: speak

voicemode:converse("Checking that now.", wait_for_response=False)

Turn 2: do the work

bash("git status")

Turn 3: speak results

voicemode:converse("Here's what I found: ...", wait_for_response=True)

When to Use Parallel vs Sequential

Scenario Approach Why

Announce + do work Parallel No dependency between speech and action

Announce + spawn agent Parallel Agent runs in background anyway

Check result then report Sequential Need result before speaking

Listen for response Sequential wait_for_response=True blocks until user finishes

Key Rules

All tool types can be parallel: MCP, Bash, Agent, Read — mix freely in one turn
Wall-clock time = longest call, not the sum of all calls
Use wait_for_response=False for the speak call when combining with other tools
Great for demos: Audience hears continuous speech with no awkward silences

Handling Pauses and Wait Requests

When the user asks you to wait or give them time:

Short pauses (up to 60 seconds): If the user says something ending with "wait" (e.g., "hang on", "give me a sec", "wait"), VoiceMode automatically pauses for 60 seconds then resumes listening. This is built-in.

Longer pauses (2+ minutes): Use bash sleep N where N is seconds. For example, if the user says "give me 5 minutes":

sleep 300 # Wait 5 minutes

Then call converse again when the wait is over:

voicemode:converse("Five minutes is up. Ready when you are.")

Configuration: The short pause duration is configurable via VOICEMODE_WAIT_DURATION (default: 60 seconds).

STT Recovery - Manual Transcription

If Whisper STT fails but the audio was recorded successfully, you can manually transcribe the saved audio file:

Transcribe the most recent recording

whisper-cli ~/.voicemode/audio/latest-STT.wav

Or check if file exists first (safe for inclusion in automation)

if [ -f ~/.voicemode/audio/latest-STT.wav ]; then whisper-cli ~/.voicemode/audio/latest-STT.wav fi

Requirements:

Audio saving must be enabled via one of:
VOICEMODE_SAVE_AUDIO=true in ~/.voicemode/voicemode.env
VOICEMODE_SAVE_ALL=true (saves all audio and transcriptions)
VOICEMODE_DEBUG=true (enables debug mode with audio saving)

How it works:

VoiceMode saves all STT recordings to ~/.voicemode/audio/ with timestamps
The latest-STT.wav symlink always points to the most recent recording
If the STT API fails, the recording is still saved for manual recovery
This lets you recover the user's speech without asking them to repeat

When to use:

STT service timeout or connection failure
Transcription returned empty but user definitely spoke
Need to verify what was actually said vs. what was transcribed

See also: Troubleshooting - No Speech Detected

Check Status

voicemode service status # All services voicemode service status whisper # Specific service

Shows service status including running state, ports, and health.

Installation

Install VoiceMode CLI and configure services

uvx voice-mode-install --yes

Install local services (Apple Silicon recommended)

voicemode service install whisper voicemode service install kokoro

See Getting Started for detailed steps.

Service Management

Start/stop services

voicemode:service("whisper", "start") voicemode:service("kokoro", "start")

View logs for troubleshooting

voicemode:service("whisper", "logs", lines=50)

Service Port Purpose

whisper 2022 Speech-to-text

kokoro 8880 Text-to-speech

voicemode 8765 HTTP/SSE server

Actions: status, start, stop, restart, logs, enable, disable

Configuration

voicemode config list # Show all settings voicemode config set VOICEMODE_TTS_VOICE nova # Set default voice voicemode config edit # Edit config file

Config file: ~/.voicemode/voicemode.env

See Configuration Guide for all options.

DJ Mode

Background music during VoiceMode sessions with track-level control.

Core playback

voicemode dj play /path/to/music.mp3 # Play a file or URL voicemode dj status # What's playing voicemode dj pause # Pause playback voicemode dj resume # Resume playback voicemode dj stop # Stop playback

Navigation and volume

voicemode dj next # Skip to next chapter voicemode dj prev # Go to previous chapter voicemode dj volume 30 # Set volume to 30%

Music For Programming

voicemode dj mfp list # List available episodes voicemode dj mfp play 49 # Play episode 49 voicemode dj mfp sync # Convert CUE files to chapters

Music library

voicemode dj find "daft punk" # Search library voicemode dj library scan # Index ~/Audio/music voicemode dj library stats # Show library info

Play history and favorites

voicemode dj history # Show recent plays voicemode dj favorite # Toggle favorite on current track

Configuration: Set VOICEMODE_DJ_VOLUME in ~/.voicemode/voicemode.env to customize startup volume (default: 50%).

CLI Cheat Sheet

Service management

voicemode service status # All services voicemode service start whisper # Start a service voicemode service logs kokoro # View logs

Diagnostics

voicemode deps # Check dependencies voicemode diag info # System info voicemode diag devices # Audio devices

DJ Mode

voicemode dj play <file|url> # Start playback voicemode dj status # What's playing voicemode dj next/prev # Navigate chapters voicemode dj stop # Stop playback voicemode dj mfp play 49 # Music For Programming

Voice Handoff Between Agents

Transfer voice conversations between Claude Code agents for multi-agent workflows.

Use cases:

Personal assistant routing to project-specific foremen
Foremen delegating to workers for focused tasks
Returning control when work is complete

Quick Reference

1. Announce the transfer

voicemode:converse("Transferring you to a project agent.", wait_for_response=False)

2. Spawn with voice instructions (mechanism depends on your setup)

spawn_agent(path="/path", prompt="Load voicemode skill, use converse to greet user")

3. Go quiet - let new agent take over

Hand-back:

voicemode:converse("Transferring you back to the assistant.", wait_for_response=False)

Stop conversing, exit or go idle

Key Principles

Announce transfers: Always tell the user before transferring
One speaker: Only one agent should use converse at a time
Distinct voices: Different voices make handoffs audible
Provide context: Tell receiving agent why user is being transferred

Detailed Documentation

See Call Routing for comprehensive guides:

Handoff Pattern - Complete hand-off and hand-back process
Voice Proxy - Relay pattern for agents without voice
Call Routing Overview - All routing patterns

Sharing Voice Services Over Tailscale

Expose local Whisper (STT) and Kokoro (TTS) to other devices on your Tailnet via HTTPS.

Why

Browsers require HTTPS for microphone access (e.g., VoiceMode Connect web app)
Tailscale serve provides automatic HTTPS with valid Let's Encrypt certificates for *.ts.net domains
Enables using your powerful local machine's GPU from any device on your Tailnet

Setup

Expose TTS (Kokoro on port 8880)

tailscale serve --bg --set-path /v1/audio/speech http://localhost:8880/v1/audio/speech

Expose STT (Whisper on port 2022)

tailscale serve --bg --set-path /v1/audio/transcriptions http://localhost:2022/v1/audio/transcriptions

Verify configuration

tailscale serve status

Reset all serve config

tailscale serve reset

Endpoints

After setup, endpoints are available at:

TTS: https://<hostname>.<tailnet>.ts.net/v1/audio/speech
STT: https://<hostname>.<tailnet>.ts.net/v1/audio/transcriptions

Important Notes

Path mapping: Tailscale strips the incoming path before forwarding, so you MUST include the full path in the target URL
Same-machine testing: Traffic doesn't route through Tailscale locally — test from another Tailnet device
Multiple paths: You can configure different paths to different backends on the same or different machines
CORS: Kokoro has CORS configured to allow https://app.voicemode.dev origins

Use with VoiceMode Connect

In the VoiceMode Connect web app settings (app.voicemode.dev/settings), set:

TTS Endpoint: https://<hostname>.<tailnet>.ts.net
STT Endpoint: https://<hostname>.<tailnet>.ts.net

Soundfonts

Audio feedback tones that play during Claude Code tool use. Toggle with voicemode soundfonts on/off . See Soundfonts Guide.

Documentation Index

Topic Link

Converse Parameters All Parameters

Installation Getting Started

Configuration Configuration Guide

Claude Code Plugin Plugin Guide

Whisper STT Whisper Setup

Kokoro TTS Kokoro Setup

Pronunciation Pronunciation Guide

Troubleshooting Troubleshooting

Soundfonts Soundfonts Guide

CLI Reference CLI Docs

DJ Mode Background Music

Related Skills

VoiceMode Connect - Remote voice via mobile/web clients (no local STT/TTS needed)

voicemode

Safety Notice

Copy this and send it to your AI assistant to learn

Speak and listen for response (most common usage)

Speak without waiting (for narration while working)

FAST: One turn — voice and action fire simultaneously

Turn 1: speak (fire-and-forget) + do the work (all parallel)

Turn 2: speak the results (with listening)

SLOW: Two turns — unnecessary sequential delay

Turn 1: speak

Turn 2: do the work

Turn 3: speak results

Transcribe the most recent recording

Or check if file exists first (safe for inclusion in automation)

Install VoiceMode CLI and configure services

Install local services (Apple Silicon recommended)

Start/stop services

View logs for troubleshooting

Core playback

Navigation and volume

Music For Programming

Music library

Play history and favorites

Service management

Diagnostics

DJ Mode

1. Announce the transfer

2. Spawn with voice instructions (mechanism depends on your setup)

3. Go quiet - let new agent take over

Stop conversing, exit or go idle

Expose TTS (Kokoro on port 8880)

Expose STT (Whisper on port 2022)

Verify configuration

Reset all serve config

Source Transparency

Related Skills

voicemode-dj

voicemode-connect

sqldown