macOS Local Voice
Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.
Requirements
- macOS (Apple Silicon recommended, Intel works too)
yapCLI in PATH — install viabrew install finnvoor/tools/yapffmpegin PATH (optional, needed for ogg/opus output) —brew install ffmpegsayandosascriptare macOS built-in
Speech-to-Text (STT)
Transcribe an audio file to text using Apple's on-device speech recognition.
node {baseDir}/scripts/stt.mjs <audio_file> [locale]
audio_file: path to audio (ogg, m4a, mp3, wav, etc.)locale: optional, e.g.zh_CN,en_US,ja_JP. If omitted, uses system default.- Outputs transcribed text to stdout.
Supported STT locales
Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.
Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.
Language detection tips
- If the user's recent messages are in Chinese → use
zh_CN - If in English → use
en_US - If mixed or unclear → try without locale (system default)
Text-to-Speech (TTS)
Convert text to an audio file using macOS native TTS.
node {baseDir}/scripts/tts.mjs "<text>" [voice_name] [output_path]
text: the text to speakvoice_name: optional, e.g.Yue (Premium),Tingting,Ava (Premium). If omitted, auto-selects the best available voice based on text language.output_path: optional, defaults to a timestamped file in~/.openclaw/media/outbound/- Outputs the generated audio file path to stdout.
- If
ffmpegis available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.
Sending as voice note
After generating the audio file, send it using the message tool:
message action=send media=<path_from_tts.sh> asVoice=true
Voice Management
List available voices, check readiness, or find the best voice for a language:
node {baseDir}/scripts/voices.mjs list [locale] # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check "<name>" # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best <locale> # Get the highest quality voice for a locale
Quality levels
- 1 = compact (low quality, always available)
- 2 = enhanced (mid quality, may need download)
- 3 = premium (highest quality, needs download from System Settings)
If a voice is not available
Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."
Notes
- The
saycommand silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always usevoices.mjs checkbefore callingtts.mjswith a specific voice name. - Premium voices (e.g.
Yue (Premium),Ava (Premium)) sound significantly better but must be manually downloaded by the user. - Siri voices are not accessible via the speech synthesis API.