speakturbo-tts

Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "speakturbo-tts" with this command: npx skills add EmZod/speakturbo-tts

speakturbo - Talk to your Claude!

Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.

Quick Start

# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms

# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav  # Should show ~50-100KB file

Output explained: = first audio received, = playback started, = done

First Run

The first execution takes 2-5 seconds while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.

# First run (slow - daemon starting)
speakturbo "Starting up"  # ~2-5 seconds

# Second run (fast - daemon already running)
speakturbo "Now I'm fast"  # ~90ms

Usage

# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"

# Save to file (no audio playback)
speakturbo "Hello" -o output.wav

# Save to specific file
speakturbo "Goodbye" -o goodbye.wav

# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q

# List available voices
speakturbo --list-voices

Available Voices

VoiceType
albaFemale (default)
mariusMale
javertMale
jeanMale
fantineFemale
cosetteFemale
eponineFemale
azelmaFemale

Performance

MetricValue
Time to first sound~90ms (daemon warm)
First run2-5s (daemon startup)
Real-time factor~4x faster
Sample rate24kHz mono

Architecture

speakturbo (Rust CLI, 2.2MB)
    │
    │ HTTP streaming (port 7125)
    ▼
speakturbo-daemon (Python + pocket-tts)
    │
    │ Model in memory, auto-shutdown after 1hr idle
    ▼
Audio playback (rodio)

Text Input

  • Encoding: UTF-8
  • Quotes in text: Use escaping: speakturbo "She said \"hello\""
  • Long text: Supported, streams as it generates

Output Path Security

The -o flag only writes to directories that are on the allowlist. By default, these are:

  • /tmp and system temp directories
  • Your current working directory
  • ~/.speakturbo/

If you need to write elsewhere, use --allow-dir:

speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path

To permanently allow a directory, add it to ~/.speakturbo/config:

mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config

The config file is one directory per line. Lines starting with # are comments.

Exit Codes

CodeMeaning
0Success (audio played/saved)
1Error (daemon connection failed, invalid args)

When to Use

Use speakturbo when:

  • You need instant audio feedback (~90ms)
  • Speed matters more than voice variety
  • Built-in voices are sufficient

Use speak instead when:

  • You need custom voice cloning (Morgan Freeman, etc.) → speak "text" --voice ~/.chatter/voices/morgan_freeman.wav
  • You need emotion tags like [laugh], [sigh]
  • Quality/variety matters more than speed

See the speak skill documentation for full usage.

Troubleshooting

No audio plays:

# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}

# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav  # macOS
aplay /tmp/test.wav   # Linux

Daemon won't start:

# Check port availability
lsof -i :7125

# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test"  # Auto-restarts daemon

First run is slow: This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).

Daemon Management

The daemon auto-starts on first use and auto-shuts down after 1 hour idle.

# Check status
curl http://127.0.0.1:7125/health

# Manual stop
pkill -f "daemon_streaming"

# View logs
cat /tmp/speakturbo.log

Comparison with speak

Featurespeakturbospeak
Time to first sound~90ms~4-8s
Voice cloning
Emotion tags
Voices8 built-inCustom wav files
Enginepocket-ttsChatterbox

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

MLX Audio Server

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

Registry SourceRecently Updated
2.2K0Profile unavailable
General

Elevenlabs Tts

ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...

Registry SourceRecently Updated
5K6Profile unavailable
General

Voice

Convert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.

Registry SourceRecently Updated
1.9K0Profile unavailable
General

Text to Speech

Generate speech audio from text using HeyGen's Starfish TTS model. Use when: (1) Generating standalone speech audio files from text, (2) Converting text to s...

Registry SourceRecently Updated
2681Profile unavailable