voiceclaw

Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper TTS. Requires whisper, piper, and ffmpeg pre-installed on the system. All inference runs on-device — no network calls, no cloud APIs, no API keys. Use when an agent receives a voice/audio message and should respond in both voice and text, or when any text response should be synthesized and sent as audio. Triggers on: voice messages, audio attachments, respond in voice, send as audio, speak this, voiceclaw.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "voiceclaw" with this command: npx skills add asif2bd/voiceclaw

VoiceClaw

Local-only voice I/O for OpenClaw agents.

  • STT: transcribe.sh — converts audio to text via local Whisper binary
  • TTS: speak.sh — converts text to speech via local Piper binary
  • Network calls: none — both scripts run fully offline
  • No cloud APIs, no API keys required

Prerequisites

The following must be installed on the system before using this skill:

RequirementPurpose
whisper binarySpeech-to-text inference
ggml-base.en.bin model fileWhisper STT model
piper binaryText-to-speech synthesis
*.onnx voice model filesPiper TTS voices
ffmpegAudio format conversion

See README.md for installation and setup instructions.


Environment Variables

VariableDefaultPurpose
WHISPER_BINauto-detected via whichPath to whisper binary
WHISPER_MODEL~/.cache/whisper/ggml-base.en.binPath to Whisper model file
PIPER_BINauto-detected via whichPath to piper binary
VOICECLAW_VOICES_DIR~/.local/share/piper/voicesDirectory containing .onnx voice model files

Verify Setup

which whisper && echo "STT binary: OK"
which piper   && echo "TTS binary: OK"
which ffmpeg  && echo "ffmpeg: OK"
ls "${WHISPER_MODEL:-$HOME/.cache/whisper/ggml-base.en.bin}" && echo "STT model: OK"
ls "${VOICECLAW_VOICES_DIR:-$HOME/.local/share/piper/voices}"/*.onnx 2>/dev/null | head -1 && echo "TTS voices: OK"

Inbound Voice: Transcribe

# Transcribe audio → text (supports ogg, mp3, m4a, wav, flac)
TRANSCRIPT=$(bash scripts/transcribe.sh /path/to/audio.ogg)

Override model path:

WHISPER_MODEL=/path/to/ggml-base.en.bin bash scripts/transcribe.sh audio.ogg

Outbound Voice: Speak

# Step 1: Generate WAV (local Piper — no network)
WAV=$(bash scripts/speak.sh "Your response here." /tmp/reply.wav en_US-lessac-medium)

# Step 2: Convert to OGG Opus (Telegram voice requirement)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply.ogg -y -loglevel error

# Step 3: Send via message tool (filePath=/tmp/reply.ogg)

Override voice directory:

VOICECLAW_VOICES_DIR=/path/to/voices bash scripts/speak.sh "Hello." /tmp/reply.wav

Available Voices

VoiceStyle
en_US-lessac-mediumNeutral American (default)
en_US-amy-mediumWarm American female
en_US-joe-mediumAmerican male
en_US-kusal-mediumExpressive American male
en_US-danny-lowDeep American male (fast)
en_GB-alba-mediumBritish female
en_GB-northern_english_male-mediumNorthern British male

Agent Behavior Rules

  1. Voice in → Voice + Text out. Always respond with both a voice reply and a text reply when a voice message is received.
  2. Include the transcript. Show "🎙️ I heard: [transcript]" at the top of every text reply to a voice message.
  3. Keep voice responses concise. Piper TTS works best under ~200 words — summarize for audio, include full detail in text.
  4. Local only. Never use a cloud TTS/STT API. Only the local whisper and piper binaries.
  5. Send voice before text. Send the audio file first, then follow with the text reply.

Full Example

# 1. Transcribe inbound voice message
TRANSCRIPT=$(bash path/to/voiceclaw/scripts/transcribe.sh /path/to/voice.ogg)

# 2. Compose reply and generate audio
RESPONSE="Deployment complete. All checks passed."
WAV=$(bash path/to/voiceclaw/scripts/speak.sh "$RESPONSE" /tmp/reply_$$.wav)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply_$$.ogg -y -loglevel error

# 3. Send voice + text
# message(action=send, filePath=/tmp/reply_$$.ogg, ...)
# reply: "🎙️ I heard: $TRANSCRIPT\n\n$RESPONSE"

Troubleshooting

IssueFix
whisper: command not foundEnsure whisper binary is installed and in PATH
Whisper model not foundSet WHISPER_MODEL=/path/to/ggml-base.en.bin
piper: command not foundEnsure piper binary is installed and in PATH
Voice model missingSet VOICECLAW_VOICES_DIR=/path/to/voices/
OGG won't play on TelegramEnsure -c:a libopus flag in ffmpeg command

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Personal Health Router

Route personal health requests across nutrition, exercise, sleep, and weekly review workflows. Use when the user asks to log calories, analyze a meal photo,...

Registry SourceRecently Updated
Automation

Agent Memory System v8

生产级 Agent 记忆系统 — 6维坐标编码 + RRF双路检索 + sqlite-vec统一存储 + 写入时因果检测 + 多Agent共享 + 记忆蒸馏 + 时间旅行 + 情感编码 + 元认知 + 内在动机 + 叙事自我 + 数字孪生 + 角色模板

Registry SourceRecently Updated
Automation

Web Gateway

Minimal Flask-based multi-user chat interface enabling OpenClaw HTTP integration with persistent UI state and optional Google Maps support.

Registry SourceRecently Updated
Automation

Futu Trading Bot

Use Futu Trade Bot Skills to run account, quote, and trade workflows with real HK market data.

Registry SourceRecently Updated