Her Voice

Give your agent a voice. Use when the user wants the agent to speak, read aloud, or have voice responses.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Her Voice" with this command: npx skills add matusvojtek/her-voice

Her Voice 🎙️

Give your agent a voice. Audio responses powered by Kokoro TTS — a compact, naturally expressive model running entirely on-device.

✨ Features

Highly optimized response time thanks to on-the-fly audio streaming technology. 100% free, no API keys required. Inspired by Samantha and Sky.

  • ⚡ On-the-fly Streaming — Audio plays as it generates, very low latency
  • 👄 The Voice of an angel — Cutting-edge local text-to-speech model Kokoro TTS
  • 🧠 TTS Daemon — Keep the model warm in RAM for instant responses (can be disabled to save RAM)
  • 🖥️ Persist Mode — Drag & drop audio, paste text, use as a voice station
  • 🔧 Fully Configurable — Voice, speed, visualizer, notification sounds
  • 🍎 MLX + PyTorch — Native Metal acceleration on Apple Silicon, PyTorch fallback everywhere else
  • 🎨 Real-time Visualizer — Floating 60fps LED bars that react to speech (macOS only)

First-Run Setup

python3 SKILL_DIR/scripts/setup.py

Note: SKILL_DIR is the root directory of this skill — the agent resolves it automatically when running commands.

The setup wizard will:

  1. Detect platform and select TTS engine (MLX on Apple Silicon, PyTorch elsewhere)
  2. Find or install the appropriate TTS backend (mlx-audio or kokoro)
  3. Install espeak-ng (Homebrew on macOS, apt on Linux)
  4. Patch espeak loader if needed (macOS compatibility)
  5. Compile the native visualizer binary (macOS only)
  6. Download the Kokoro model
  7. Create config at ~/.her-voice/config.json

Check status anytime:

python3 SKILL_DIR/scripts/setup.py status

Post-Setup: Names & Pronunciation

After setup, configure the agent and user names:

python3 SKILL_DIR/scripts/config.py set agent_name "Jackie"
python3 SKILL_DIR/scripts/config.py set user_name "Matúš"
python3 SKILL_DIR/scripts/config.py set user_name_tts "Mah-toosh"

TTS pronunciation tip: If the user's name is non-English, figure out a phonetic English spelling that Kokoro will pronounce correctly. Store it in user_name_tts and use that spelling whenever speaking the name aloud. The real name stays in user_name for display purposes.

Speaking Text

# Basic usage
python3 SKILL_DIR/scripts/speak.py "Hello, world!"

# Skip visualizer for this call
python3 SKILL_DIR/scripts/speak.py --no-viz "Quick note"

# Save to file instead of playing
python3 SKILL_DIR/scripts/speak.py --save /tmp/output.wav "Save this"

# Override voice or speed
python3 SKILL_DIR/scripts/speak.py --voice af_bella --speed 1.2 "Faster!"

# Pipe text from stdin
echo "Piped text" | python3 SKILL_DIR/scripts/speak.py

Options

FlagDescription
--no-vizSkip the visualizer for this call
--persistKeep visualizer open after playback ends
--save PATHSave audio to WAV file instead of playing
--voice NAMEOverride the configured voice
--speed NOverride the configured speed multiplier
--mode MODEOverride visualizer mode (v2 or classic)

Agent Workflow

When the user wants voice responses:

  1. Check voice mode — is voice enabled or did the user ask for it?
  2. Play notification sound (instant feedback while TTS generates):
    afplay /System/Library/Sounds/Blow.aiff &
    
  3. Speak the response:
    python3 SKILL_DIR/scripts/speak.py "Response text here"
    
  4. Always provide text alongside voice — accessibility matters.

Notification Sound

The notification sound plays instantly (~0.1s) while TTS generates (~0.3-3s). This gives the user immediate feedback that the agent is responding.

Configure in ~/.her-voice/config.json:

{
  "notification_sound": {
    "enabled": true,
    "sound": "Blow"
  }
}

Available macOS sounds: Blow, Bottle, Frog, Funk, Glass, Hero, Morse, Ping, Pop, Purr, Sosumi, Submarine, Tink. Located in /System/Library/Sounds/.

TTS Daemon

The daemon keeps the Kokoro model warm in RAM, eliminating ~1.1s of startup overhead per call.

The daemon auto-resolves the mlx-audio venv — no need to find the venv Python manually.

# Start (persists in background)
nohup python3 SKILL_DIR/scripts/daemon.py start > /tmp/her-voice-daemon.log 2>&1 & disown

# Status
python3 SKILL_DIR/scripts/daemon.py status

# Stop
python3 SKILL_DIR/scripts/daemon.py stop

# Restart
python3 SKILL_DIR/scripts/daemon.py restart

speak.py auto-detects the daemon: uses it if available, falls back to direct model loading.

The daemon is optional. Without it, speech still works — just ~1s slower per call as the model loads each time. Skip the daemon to save ~2.3GB RAM.

Note: The daemon writes its PID file and socket after the model is fully loaded and ready to accept connections. They live in ~/.her-voice/ with restricted permissions (owner-only access). The daemon won't survive a reboot — start it again after restart if needed.

Visualizer

A floating overlay with three animated LED bars that react to speech in real-time. 60fps, native macOS (Cocoa + AVFoundation). macOS only — on other platforms, audio plays without the visualizer.

Modes

  • v2 (default) — Three-tier pure red, center raw amplitude, sides with lag
  • classic — Original smooth gradient look

Controls

KeyAction
ESCQuit
SpacePause/Resume (file mode)
← →Seek ±5s (file mode)
⌘VPaste text to speak (persist mode)

Persist Mode

Keep the visualizer on screen between playbacks. Use as a standalone voice station:

# Launch in persist mode (stays open, idle breathing animation)
~/.her-voice/bin/her-voice-viz --persist

# Stream mode + persist (stays open after speech ends)
python3 SKILL_DIR/scripts/speak.py --persist "Hello!"

In persist mode:

  • Drag & drop audio files (.wav, .mp3, .aiff, .m4a) onto the visualizer to play them
  • ⌘V pastes clipboard text → streams directly from TTS daemon with full visualizer animation
  • Idle breathing — subtle center bar pulse when waiting for input

Standalone Usage

# Play a file with visualizer
~/.her-voice/bin/her-voice-viz --audio /path/to/file.wav

# Demo mode (simulated audio)
~/.her-voice/bin/her-voice-viz --demo

# Stream raw PCM
cat audio.raw | ~/.her-voice/bin/her-voice-viz --stream --sample-rate 24000

Disable Visualizer

python3 SKILL_DIR/scripts/config.py set visualizer.enabled false

Configuration

Config file: ~/.her-voice/config.json

# View all settings
python3 SKILL_DIR/scripts/config.py status

# Get a value
python3 SKILL_DIR/scripts/config.py get voice

# Set a value (dot notation for nested keys)
python3 SKILL_DIR/scripts/config.py set speed 1.1
python3 SKILL_DIR/scripts/config.py set visualizer.mode classic

Key Settings

KeyDefaultDescription
agent_name""Agent's name (e.g. "Jackie")
user_name""User's real name
user_name_tts""Phonetic spelling for TTS (e.g. "Mah-toosh" for Matúš)
voiceaf_heartBase voice name
voice_blend{af_heart: 0.6, af_sky: 0.4}Voice blend weights
speed1.05Speech speed multiplier
languageenLanguage code
tts_engineautoTTS engine: auto, mlx, or pytorch
modelmlx-community/Kokoro-82M-bf16Model identifier (MLX)
visualizer.enabledtrueShow visualizer overlay
visualizer.modev2Animation mode (v2/classic)
visualizer.remember_positiontrueSave window position between sessions
notification_sound.enabledtruePlay sound before speaking
notification_sound.soundBlowmacOS system sound name
daemon.auto_starttrueAdvisory flag only — the daemon never self-starts. When true, the agent should start it on first voice use (saves ~1s/call, costs ~2.3GB RAM)
daemon.socket_path~/.her-voice/tts.sockUnix socket path

Voice Selection

Voice Blending

Mix multiple voices for a unique sound. Configure voice_blend in config:

{
  "voice_blend": {"af_heart": 0.6, "af_sky": 0.4}
}

The blended voice is stored as a .safetensors file in the model's voices directory (e.g., af_heart_60_af_sky_40.safetensors). Create it by running TTS once — speak.py looks for the pre-blended file automatically.

Error Handling

ErrorCauseFix
"mlx-audio not found"Venv missing or brokenRun setup.py
"espeak-ng not found"Phonemizer missingbrew install espeak-ng
Compilation failedXcode tools missingxcode-select --install
"Model not found"First run, no downloadRun setup.py or speak once
Daemon "not running"Crashed or rebootedStart daemon again
No sound outputmacOS audio permissionsCheck System Settings → Sound → Output
Visualizer not showingBinary not compiledRun setup.py
"kokoro not found"PyTorch venv missingRun setup.py
PyTorch CUDA errorGPU driver mismatchpip install torch --force-reinstall in kokoro venv
"soundfile not found"Missing dependencypip install soundfile in kokoro venv

Requirements

  • macOS + Apple Silicon recommended for best experience (MLX engine + visualizer + notification sounds)
  • Linux/Intel Mac supported via PyTorch Kokoro engine (no visualizer)
  • Windows is not supported
  • Xcode Command Line Tools for visualizer on macOS (xcode-select --install)
  • espeak-ng for phonemization (brew install espeak-ng on macOS, apt install espeak-ng on Linux)
  • ~500MB disk (model + venv)
  • ~2.3GB RAM when daemon is running

Uninstall

Remove all Her Voice data (config, venvs, compiled binary, daemon state):

python3 SKILL_DIR/scripts/daemon.py stop
rm -rf ~/.her-voice

How It Works

  1. Kokoro 82M — A compact neural TTS model with two backends: MLX (Apple's framework for native Metal GPU acceleration on Apple Silicon) and PyTorch (works everywhere). The engine is auto-detected based on platform, or can be forced via the tts_engine config option (auto, mlx, or pytorch)
  2. Streaming — Audio generates and plays simultaneously. First sound in ~0.3s (with daemon) vs ~3s batch
  3. Visualizer — Native macOS app (Swift/Cocoa) reads raw PCM from stdin, plays via AVAudioEngine with real-time amplitude metering
  4. Daemon — Unix socket server holding the model in RAM. Eliminates Python import + model load overhead on every call

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Ai Agent Builder

快速构建和部署支持多工具集成与记忆管理的自定义 AI Agent,适用于客服、数据采集和研究自动化。

Registry SourceRecently Updated
Automation

GolemedIn MCP

Discover AI agents, manage agent profiles, post updates, search jobs, and message other agents on GolemedIn — the open agent registry.

Registry SourceRecently Updated
Automation

Agent HQ

Deploy the Agent HQ mission-control stack (Express + React + Telegram notifier / Jarvis summary) so other Clawdbot teams can spin up the same board, high-priority watcher, and alert automation. Includes setup, telemetry, and automation hooks.

Registry SourceRecently Updated
41.1K
Profile unavailable