vibevoice

Local Spanish TTS using Microsoft VibeVoice. Generate natural voice audio from text, optimized for WhatsApp voice messages.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "vibevoice" with this command: npx skills add estudiosdurero/vibevoice

VibeVoice TTS

Local text-to-speech using Microsoft's VibeVoice model. Generates natural Spanish voice audio, perfect for WhatsApp voice messages.

Quick Start

# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg

# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg

# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg

# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg

Configuration

SettingDefaultDescription
Voicesp-Spk1_manSpanish male voice (slight Mexican accent)
Speed1.1515% faster than normal
Format.oggOpus codec for WhatsApp

Available Voices

Spanish:

  • sp-Spk1_man - Male, slight Mexican accent (default)

English:

  • en-Wayne - Male
  • en-Denise - Female
  • Other voices in ~/VibeVoice/demo/voices/streaming_model/

Output Formats

  • .ogg - Opus codec (WhatsApp compatible, recommended)
  • .mp3 - MP3 format
  • .wav - Uncompressed WAV

For WhatsApp

Always use .ogg format with asVoice=true in the message tool:

# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg

# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true

Requirements

  • GPU: NVIDIA with ~2GB VRAM
  • VibeVoice: Installed at ~/VibeVoice
  • ffmpeg: For audio conversion
  • Python 3.10+: With torch, torchaudio

Performance

  • RTF: ~0.24x (generates faster than realtime)
  • 1 minute of audio ≈ 15 seconds to generate

Notes

  • First run loads model (~10s), subsequent runs are faster
  • Audio rule: Only send voice if user requests it or speaks via audio
  • Keep text under 1500 chars for best quality

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Tencent Cloud Rum

Query Tencent Cloud RUM data, analyze Web performance (LCP/FCP/WebVitals), troubleshoot JS/Promise errors, analyze API latency & error rates, diagnose slow s...

Registry SourceRecently Updated
General

Generator Video

Skip the learning curve of professional editing software. Describe what you want — generate a 30-second promotional video from this script with background mu...

Registry SourceRecently Updated
General

A Video Using

Skip the learning curve of professional editing software. Describe what you want — trim the intro, add background music, and export as a highlight reel — and...

Registry SourceRecently Updated
General

Power Automate Monitoring

**Pro+ subscription required.** Tenant-wide Power Automate flow health monitoring, failure rate analytics, and asset inventory using the FlowStudio MCP cache...

Registry SourceRecently Updated