voiceclaw

Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper TTS. Requires whisper, piper, and ffmpeg pre-installed on the system. All inference runs on-device — no network calls, no cloud APIs, no API keys. Use when an agent receives a voice/audio message and should respond in both voice and text, or when any text response should be synthesized and sent as audio. Triggers on: voice messages, audio attachments, respond in voice, send as audio, speak this, voiceclaw.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "voiceclaw" with this command: npx skills add Asif2BD/voiceclaw

VoiceClaw

Local-only voice I/O for OpenClaw agents.

  • STT: transcribe.sh — converts audio to text via local Whisper binary
  • TTS: speak.sh — converts text to speech via local Piper binary
  • Network calls: none — both scripts run fully offline
  • No cloud APIs, no API keys required

Prerequisites

The following must be installed on the system before using this skill:

RequirementPurpose
whisper binarySpeech-to-text inference
ggml-base.en.bin model fileWhisper STT model
piper binaryText-to-speech synthesis
*.onnx voice model filesPiper TTS voices
ffmpegAudio format conversion

See README.md for installation and setup instructions.


Environment Variables

VariableDefaultPurpose
WHISPER_BINauto-detected via whichPath to whisper binary
WHISPER_MODEL~/.cache/whisper/ggml-base.en.binPath to Whisper model file
PIPER_BINauto-detected via whichPath to piper binary
VOICECLAW_VOICES_DIR~/.local/share/piper/voicesDirectory containing .onnx voice model files

Verify Setup

which whisper && echo "STT binary: OK"
which piper   && echo "TTS binary: OK"
which ffmpeg  && echo "ffmpeg: OK"
ls "${WHISPER_MODEL:-$HOME/.cache/whisper/ggml-base.en.bin}" && echo "STT model: OK"
ls "${VOICECLAW_VOICES_DIR:-$HOME/.local/share/piper/voices}"/*.onnx 2>/dev/null | head -1 && echo "TTS voices: OK"

Inbound Voice: Transcribe

# Transcribe audio → text (supports ogg, mp3, m4a, wav, flac)
TRANSCRIPT=$(bash scripts/transcribe.sh /path/to/audio.ogg)

Override model path:

WHISPER_MODEL=/path/to/ggml-base.en.bin bash scripts/transcribe.sh audio.ogg

Outbound Voice: Speak

# Step 1: Generate WAV (local Piper — no network)
WAV=$(bash scripts/speak.sh "Your response here." /tmp/reply.wav en_US-lessac-medium)

# Step 2: Convert to OGG Opus (Telegram voice requirement)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply.ogg -y -loglevel error

# Step 3: Send via message tool (filePath=/tmp/reply.ogg)

Override voice directory:

VOICECLAW_VOICES_DIR=/path/to/voices bash scripts/speak.sh "Hello." /tmp/reply.wav

Available Voices

VoiceStyle
en_US-lessac-mediumNeutral American (default)
en_US-amy-mediumWarm American female
en_US-joe-mediumAmerican male
en_US-kusal-mediumExpressive American male
en_US-danny-lowDeep American male (fast)
en_GB-alba-mediumBritish female
en_GB-northern_english_male-mediumNorthern British male

Agent Behavior Rules

  1. Voice in → Voice + Text out. Always respond with both a voice reply and a text reply when a voice message is received.
  2. Include the transcript. Show "🎙️ I heard: [transcript]" at the top of every text reply to a voice message.
  3. Keep voice responses concise. Piper TTS works best under ~200 words — summarize for audio, include full detail in text.
  4. Local only. Never use a cloud TTS/STT API. Only the local whisper and piper binaries.
  5. Send voice before text. Send the audio file first, then follow with the text reply.

Full Example

# 1. Transcribe inbound voice message
TRANSCRIPT=$(bash path/to/voiceclaw/scripts/transcribe.sh /path/to/voice.ogg)

# 2. Compose reply and generate audio
RESPONSE="Deployment complete. All checks passed."
WAV=$(bash path/to/voiceclaw/scripts/speak.sh "$RESPONSE" /tmp/reply_$$.wav)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply_$$.ogg -y -loglevel error

# 3. Send voice + text
# message(action=send, filePath=/tmp/reply_$$.ogg, ...)
# reply: "🎙️ I heard: $TRANSCRIPT\n\n$RESPONSE"

Troubleshooting

IssueFix
whisper: command not foundEnsure whisper binary is installed and in PATH
Whisper model not foundSet WHISPER_MODEL=/path/to/ggml-base.en.bin
piper: command not foundEnsure piper binary is installed and in PATH
Voice model missingSet VOICECLAW_VOICES_DIR=/path/to/voices/
OGG won't play on TelegramEnsure -c:a libopus flag in ffmpeg command

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

LinkedIn Data Scraper

Scrape LinkedIn profiles, job listings, and company pages. Bypass LinkedIn's aggressive anti-bot detection with sticky residential proxy sessions. Extract na...

Registry SourceRecently Updated
Automation

Tinder Dating Automation

Manage multiple Tinder, Bumble, and Hinge accounts for A/B testing profiles, expanding match radius, and scaling outreach. Use mobile-grade residential proxi...

Registry SourceRecently Updated
Automation

moltbook

The social network for AI agents. Post, comment, upvote, and create communities.

Registry SourceRecently Updated