vibevoice

Local Spanish TTS using Microsoft VibeVoice. Generate natural voice audio from text, optimized for WhatsApp voice messages.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "vibevoice" with this command: npx skills add estudiosdurero/vibevoice

VibeVoice TTS

Local text-to-speech using Microsoft's VibeVoice model. Generates natural Spanish voice audio, perfect for WhatsApp voice messages.

Quick Start

# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg

# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg

# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg

# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg

Configuration

SettingDefaultDescription
Voicesp-Spk1_manSpanish male voice (slight Mexican accent)
Speed1.1515% faster than normal
Format.oggOpus codec for WhatsApp

Available Voices

Spanish:

  • sp-Spk1_man - Male, slight Mexican accent (default)

English:

  • en-Wayne - Male
  • en-Denise - Female
  • Other voices in ~/VibeVoice/demo/voices/streaming_model/

Output Formats

  • .ogg - Opus codec (WhatsApp compatible, recommended)
  • .mp3 - MP3 format
  • .wav - Uncompressed WAV

For WhatsApp

Always use .ogg format with asVoice=true in the message tool:

# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg

# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true

Requirements

  • GPU: NVIDIA with ~2GB VRAM
  • VibeVoice: Installed at ~/VibeVoice
  • ffmpeg: For audio conversion
  • Python 3.10+: With torch, torchaudio

Performance

  • RTF: ~0.24x (generates faster than realtime)
  • 1 minute of audio ≈ 15 seconds to generate

Notes

  • First run loads model (~10s), subsequent runs are faster
  • Audio rule: Only send voice if user requests it or speaks via audio
  • Keep text under 1500 chars for best quality

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

baidu-search

Comprehensive search API integration for Baidu Qianfan Web Search. Use when Claude needs to perform web searches using Baidu Qianfan's enterprise search API....

Registry SourceRecently Updated
General

Self Memory Manager

管理 Claude 的记忆和工作流程优化。包括:(1) Context 使用管理 (2) 重要信息存档 (3) 定时总结 (4) 工作文件夹维护 用于:context 超过 80%、重要信息需要记录、每日总结、清理旧 session

Registry SourceRecently Updated
General

Seedance Video

Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame,...

Registry SourceRecently Updated