Pocket TTS
Local text-to-speech via pocket-tts server. Streams audio for low latency. macOS only (uses afplay as fallback).
Prerequisites: pip install pocket-tts and brew install ffmpeg
Quick Reference
# Ensure server is running (do this first)
curl -s http://localhost:8321/health > /dev/null 2>&1 || {
pocket-tts serve --voice ~/.config/pocket-tts/default-voice.wav --port 8321 > /dev/null 2>&1 &
sleep 4
}
# Speak with streaming playback (audio starts immediately)
curl -s -X POST http://localhost:8321/tts -F "text=Hello world" -o - | ffplay -nodisp -autoexit -loglevel quiet -
# Or with temp file (if ffplay unavailable)
curl -s -X POST http://localhost:8321/tts -F "text=Hello world" -o /tmp/speak.wav && afplay /tmp/speak.wav && rm /tmp/speak.wav
Architecture
Always use the server — it keeps the model and voice embedding warm in memory.
- Port: 8321
- Default voice:
~/.config/pocket-tts/default-voice.wav(loaded once at server start) - Streaming:
/ttsreturns chunked WAV. Pipe toffplayfor immediate playback during generation.
Changing Voices
Per-request (server keeps default warm, but can generate with others):
curl -s -X POST http://localhost:8321/tts -F "text=Hello" -F "voice_url=jean" -o - | ffplay -nodisp -autoexit -loglevel quiet -
Built-in voices: alba, marius, javert, jean, fantine, cosette, eponine, azelma
Custom: Any http://, https://, or hf:// URL
To change the default, restart server with different --voice.
Creating Custom Voices
# Extract 30s clip from source (pocket-tts truncates to 30s anyway)
ffmpeg -y -ss START_SECONDS -t 30 -i input.mp3 -ar 24000 -ac 1 ~/.config/pocket-tts/default-voice.wav
Troubleshooting
Server not responding: Check if process died, restart with serve command
Slow first response: Server needs ~4s to load model on first start
No audio: Ensure ffplay (from ffmpeg) or afplay (macOS built-in) is available