First-Time Setup
If VoiceMode isn't working or MCP fails to connect, run:
/voicemode:install
After install, reconnect MCP: /mcp → select voicemode → "Reconnect" (or restart Claude Code).
VoiceMode
Natural voice conversations with Claude Code using speech-to-text (STT) and text-to-speech (TTS).
Note: The Python package is voice-mode (hyphen), but the CLI command is voicemode (no hyphen).
When to Use MCP vs CLI
Task Use Why
Voice conversations MCP voicemode:converse
Faster - server already running
Service start/stop MCP voicemode:service
Works within Claude Code
Installation CLI voice-mode-install
One-time setup
Configuration CLI voicemode config
Edit settings directly
Diagnostics CLI voicemode diag
Administrative tasks
Usage
Use the converse MCP tool to speak to users and hear their responses:
Speak and listen for response (most common usage)
voicemode:converse("Hello! What would you like to work on?")
Speak without waiting (for narration while working)
voicemode:converse("Searching the codebase now...", wait_for_response=False)
For most conversations, just pass your message - defaults handle everything else. Use default converse tool parameters unless there's a good reason not to. Timing parameters (listen_duration_max , listen_duration_min ) use smart defaults with silence detection - don't override unless the user requests it or you see a clear need. Defaults are configurable by the user via ~/.voicemode/voicemode.env .
Parameter Default Description
message
required Text to speak
wait_for_response
true Listen after speaking
voice
auto TTS voice
For all parameters, see Converse Parameters.
Best Practices
-
Narrate without waiting - Use wait_for_response=False when announcing actions
-
One question at a time - Don't bundle multiple questions in voice mode
-
Check status first - Verify services are running before starting conversations
-
Let VoiceMode auto-select - Don't hardcode providers unless user has preference
-
First run is slow - Model downloads happen on first start (2-5 min), then instant
Parallel Tool Calls (Zero Dead Air)
When performing actions during a voice conversation, use parallel tool calls to eliminate dead air. Send the voice message and the action in the same turn so they execute concurrently.
Pattern: Speak + Act in Parallel
FAST: One turn — voice and action fire simultaneously
Turn 1: speak (fire-and-forget) + do the work (all parallel)
voicemode:converse("Checking that now.", wait_for_response=False) bash("git status") Agent(prompt="Research X", run_in_background=True)
Turn 2: speak the results (with listening)
voicemode:converse("Here's what I found: ...", wait_for_response=True)
SLOW: Two turns — unnecessary sequential delay
Turn 1: speak
voicemode:converse("Checking that now.", wait_for_response=False)
Turn 2: do the work
bash("git status")
Turn 3: speak results
voicemode:converse("Here's what I found: ...", wait_for_response=True)
When to Use Parallel vs Sequential
Scenario Approach Why
Announce + do work Parallel No dependency between speech and action
Announce + spawn agent Parallel Agent runs in background anyway
Check result then report Sequential Need result before speaking
Listen for response Sequential wait_for_response=True blocks until user finishes
Key Rules
-
All tool types can be parallel: MCP, Bash, Agent, Read — mix freely in one turn
-
Wall-clock time = longest call, not the sum of all calls
-
Use wait_for_response=False for the speak call when combining with other tools
-
Great for demos: Audience hears continuous speech with no awkward silences
Handling Pauses and Wait Requests
When the user asks you to wait or give them time:
Short pauses (up to 60 seconds): If the user says something ending with "wait" (e.g., "hang on", "give me a sec", "wait"), VoiceMode automatically pauses for 60 seconds then resumes listening. This is built-in.
Longer pauses (2+ minutes): Use bash sleep N where N is seconds. For example, if the user says "give me 5 minutes":
sleep 300 # Wait 5 minutes
Then call converse again when the wait is over:
voicemode:converse("Five minutes is up. Ready when you are.")
Configuration: The short pause duration is configurable via VOICEMODE_WAIT_DURATION (default: 60 seconds).
STT Recovery - Manual Transcription
If Whisper STT fails but the audio was recorded successfully, you can manually transcribe the saved audio file:
Transcribe the most recent recording
whisper-cli ~/.voicemode/audio/latest-STT.wav
Or check if file exists first (safe for inclusion in automation)
if [ -f ~/.voicemode/audio/latest-STT.wav ]; then whisper-cli ~/.voicemode/audio/latest-STT.wav fi
Requirements:
-
Audio saving must be enabled via one of:
-
VOICEMODE_SAVE_AUDIO=true in ~/.voicemode/voicemode.env
-
VOICEMODE_SAVE_ALL=true (saves all audio and transcriptions)
-
VOICEMODE_DEBUG=true (enables debug mode with audio saving)
How it works:
-
VoiceMode saves all STT recordings to ~/.voicemode/audio/ with timestamps
-
The latest-STT.wav symlink always points to the most recent recording
-
If the STT API fails, the recording is still saved for manual recovery
-
This lets you recover the user's speech without asking them to repeat
When to use:
-
STT service timeout or connection failure
-
Transcription returned empty but user definitely spoke
-
Need to verify what was actually said vs. what was transcribed
See also: Troubleshooting - No Speech Detected
Check Status
voicemode service status # All services voicemode service status whisper # Specific service
Shows service status including running state, ports, and health.
Installation
Install VoiceMode CLI and configure services
uvx voice-mode-install --yes
Install local services (Apple Silicon recommended)
voicemode service install whisper voicemode service install kokoro
See Getting Started for detailed steps.
Service Management
Start/stop services
voicemode:service("whisper", "start") voicemode:service("kokoro", "start")
View logs for troubleshooting
voicemode:service("whisper", "logs", lines=50)
Service Port Purpose
whisper 2022 Speech-to-text
kokoro 8880 Text-to-speech
voicemode 8765 HTTP/SSE server
Actions: status, start, stop, restart, logs, enable, disable
Configuration
voicemode config list # Show all settings voicemode config set VOICEMODE_TTS_VOICE nova # Set default voice voicemode config edit # Edit config file
Config file: ~/.voicemode/voicemode.env
See Configuration Guide for all options.
DJ Mode
Background music during VoiceMode sessions with track-level control.
Core playback
voicemode dj play /path/to/music.mp3 # Play a file or URL voicemode dj status # What's playing voicemode dj pause # Pause playback voicemode dj resume # Resume playback voicemode dj stop # Stop playback
Navigation and volume
voicemode dj next # Skip to next chapter voicemode dj prev # Go to previous chapter voicemode dj volume 30 # Set volume to 30%
Music For Programming
voicemode dj mfp list # List available episodes voicemode dj mfp play 49 # Play episode 49 voicemode dj mfp sync # Convert CUE files to chapters
Music library
voicemode dj find "daft punk" # Search library voicemode dj library scan # Index ~/Audio/music voicemode dj library stats # Show library info
Play history and favorites
voicemode dj history # Show recent plays voicemode dj favorite # Toggle favorite on current track
Configuration: Set VOICEMODE_DJ_VOLUME in ~/.voicemode/voicemode.env to customize startup volume (default: 50%).
CLI Cheat Sheet
Service management
voicemode service status # All services voicemode service start whisper # Start a service voicemode service logs kokoro # View logs
Diagnostics
voicemode deps # Check dependencies voicemode diag info # System info voicemode diag devices # Audio devices
DJ Mode
voicemode dj play <file|url> # Start playback voicemode dj status # What's playing voicemode dj next/prev # Navigate chapters voicemode dj stop # Stop playback voicemode dj mfp play 49 # Music For Programming
Voice Handoff Between Agents
Transfer voice conversations between Claude Code agents for multi-agent workflows.
Use cases:
-
Personal assistant routing to project-specific foremen
-
Foremen delegating to workers for focused tasks
-
Returning control when work is complete
Quick Reference
1. Announce the transfer
voicemode:converse("Transferring you to a project agent.", wait_for_response=False)
2. Spawn with voice instructions (mechanism depends on your setup)
spawn_agent(path="/path", prompt="Load voicemode skill, use converse to greet user")
3. Go quiet - let new agent take over
Hand-back:
voicemode:converse("Transferring you back to the assistant.", wait_for_response=False)
Stop conversing, exit or go idle
Key Principles
-
Announce transfers: Always tell the user before transferring
-
One speaker: Only one agent should use converse at a time
-
Distinct voices: Different voices make handoffs audible
-
Provide context: Tell receiving agent why user is being transferred
Detailed Documentation
See Call Routing for comprehensive guides:
-
Handoff Pattern - Complete hand-off and hand-back process
-
Voice Proxy - Relay pattern for agents without voice
-
Call Routing Overview - All routing patterns
Sharing Voice Services Over Tailscale
Expose local Whisper (STT) and Kokoro (TTS) to other devices on your Tailnet via HTTPS.
Why
-
Browsers require HTTPS for microphone access (e.g., VoiceMode Connect web app)
-
Tailscale serve provides automatic HTTPS with valid Let's Encrypt certificates for *.ts.net domains
-
Enables using your powerful local machine's GPU from any device on your Tailnet
Setup
Expose TTS (Kokoro on port 8880)
tailscale serve --bg --set-path /v1/audio/speech http://localhost:8880/v1/audio/speech
Expose STT (Whisper on port 2022)
tailscale serve --bg --set-path /v1/audio/transcriptions http://localhost:2022/v1/audio/transcriptions
Verify configuration
tailscale serve status
Reset all serve config
tailscale serve reset
Endpoints
After setup, endpoints are available at:
-
TTS: https://<hostname>.<tailnet>.ts.net/v1/audio/speech
-
STT: https://<hostname>.<tailnet>.ts.net/v1/audio/transcriptions
Important Notes
-
Path mapping: Tailscale strips the incoming path before forwarding, so you MUST include the full path in the target URL
-
Same-machine testing: Traffic doesn't route through Tailscale locally — test from another Tailnet device
-
Multiple paths: You can configure different paths to different backends on the same or different machines
-
CORS: Kokoro has CORS configured to allow https://app.voicemode.dev origins
Use with VoiceMode Connect
In the VoiceMode Connect web app settings (app.voicemode.dev/settings), set:
-
TTS Endpoint: https://<hostname>.<tailnet>.ts.net
-
STT Endpoint: https://<hostname>.<tailnet>.ts.net
Soundfonts
Audio feedback tones that play during Claude Code tool use. Toggle with voicemode soundfonts on/off . See Soundfonts Guide.
Documentation Index
Topic Link
Converse Parameters All Parameters
Installation Getting Started
Configuration Configuration Guide
Claude Code Plugin Plugin Guide
Whisper STT Whisper Setup
Kokoro TTS Kokoro Setup
Pronunciation Pronunciation Guide
Troubleshooting Troubleshooting
Soundfonts Soundfonts Guide
CLI Reference CLI Docs
DJ Mode Background Music
Related Skills
- VoiceMode Connect - Remote voice via mobile/web clients (no local STT/TTS needed)