use-local-whisper

Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "use-local-whisper" with this command: npx skills add qwibitai/nanoclaw/qwibitai-nanoclaw-use-local-whisper

Use Local Whisper

Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.

Channel support: Currently WhatsApp only. The transcription module (src/transcription.ts ) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.

Note: The Homebrew package is whisper-cpp , but the CLI binary it installs is whisper-cli .

Prerequisites

  • voice-transcription skill must be applied first (WhatsApp channel)

  • macOS with Apple Silicon (M1+) recommended

  • whisper-cpp installed: brew install whisper-cpp (provides the whisper-cli binary)

  • ffmpeg installed: brew install ffmpeg

  • A GGML model file downloaded to data/models/

Phase 1: Pre-flight

Check if already applied

Check if src/transcription.ts already uses whisper-cli :

grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied"

If already applied, skip to Phase 3 (Verify).

Check dependencies are installed

whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING" ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"

If missing, install via Homebrew:

brew install whisper-cpp ffmpeg

Check for model file

ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"

If no model exists, download the base model (148MB, good balance of speed and accuracy):

mkdir -p data/models curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"

For better accuracy at the cost of speed, use ggml-small.bin (466MB) or ggml-medium.bin (1.5GB).

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

git remote -v

If whatsapp is missing, add it:

git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git

Merge the skill branch

git fetch whatsapp skill/local-whisper git merge whatsapp/skill/local-whisper || { git checkout --theirs package-lock.json git add package-lock.json git merge --continue }

This modifies src/transcription.ts to use the whisper-cli binary instead of the OpenAI API.

Validate

npm run build

Phase 3: Verify

Ensure launchd PATH includes Homebrew

The NanoClaw launchd service runs with a restricted PATH. whisper-cli and ffmpeg are in /opt/homebrew/bin/ (Apple Silicon) or /usr/local/bin/ (Intel), which may not be in the plist's PATH.

Check the current PATH:

grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist

If /opt/homebrew/bin is missing, add it to the <string> value inside the PATH key in the plist. Then reload:

launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist

Build and restart

npm run build launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Test

Send a voice note in any registered group. The agent should receive it as [Voice: <transcript>] .

Check logs

tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"

Look for:

  • Transcribed voice message — successful transcription

  • whisper.cpp transcription failed — check model path, ffmpeg, or PATH

Configuration

Environment variables (optional, set in .env ):

Variable Default Description

WHISPER_BIN

whisper-cli

Path to whisper.cpp binary

WHISPER_MODEL

data/models/ggml-base.bin

Path to GGML model file

Troubleshooting

"whisper.cpp transcription failed": Ensure both whisper-cli and ffmpeg are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:

ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt

Transcription works in dev but not as service: The launchd plist PATH likely doesn't include /opt/homebrew/bin . See "Ensure launchd PATH includes Homebrew" in Phase 3.

Slow transcription: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.

Wrong language: whisper.cpp auto-detects language. To force a language, you can set WHISPER_LANG and modify src/transcription.ts to pass -l $WHISPER_LANG .

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

debug

No summary provided by upstream source.

Repository SourceNeeds Review
General

update-nanoclaw

No summary provided by upstream source.

Repository SourceNeeds Review
General

add-reactions

No summary provided by upstream source.

Repository SourceNeeds Review
General

add-voice-transcription

No summary provided by upstream source.

Repository SourceNeeds Review