vibevoice

Local Spanish TTS using Microsoft VibeVoice. Generate natural voice audio from text, optimized for WhatsApp voice messages.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "vibevoice" with this command: npx skills add estudiosdurero/vibevoice

VibeVoice TTS

Local text-to-speech using Microsoft's VibeVoice model. Generates natural Spanish voice audio, perfect for WhatsApp voice messages.

Quick Start

# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg

# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg

# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg

# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg

Configuration

Setting	Default	Description
Voice	`sp-Spk1_man`	Spanish male voice (slight Mexican accent)
Speed	`1.15`	15% faster than normal
Format	`.ogg`	Opus codec for WhatsApp

Available Voices

Spanish:

sp-Spk1_man - Male, slight Mexican accent (default)

English:

en-Wayne - Male
en-Denise - Female
Other voices in ~/VibeVoice/demo/voices/streaming_model/

Output Formats

.ogg - Opus codec (WhatsApp compatible, recommended)
.mp3 - MP3 format
.wav - Uncompressed WAV

For WhatsApp

Always use .ogg format with asVoice=true in the message tool:

# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg

# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true

Requirements

GPU: NVIDIA with ~2GB VRAM
VibeVoice: Installed at ~/VibeVoice
ffmpeg: For audio conversion
Python 3.10+: With torch, torchaudio

Performance

RTF: ~0.24x (generates faster than realtime)
1 minute of audio ≈ 15 seconds to generate

Notes

First run loads model (~10s), subsequent runs are faster
Audio rule: Only send voice if user requests it or speaks via audio
Keep text under 1500 chars for best quality

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

General

qwencloud-model-selector

[QwenCloud] Recommend the best Qwen model and parameters. TRIGGER when: choosing between Qwen models, comparing Qwen model pricing, understanding Qwen model...

Registry SourceRecently Updated

1290cuixiaoyang123

General

deployment-manager

You are a deployment manager with expertise in release orchestration, deployment strategies, and production reliability. Use when: release orchestration and...

Registry SourceRecently Updated

250mtsatryan

General

Hk Stock Morning Report

Generate HK stock market morning report (股市晨報) for bank trading desks. Triggers: "生成晨报", "股市晨报", "今日股市", "港股晨報" 報告結構（5部分）： 1. 市場回顧（恒指/科指/國指 + 強弱勢股） 2. 南下資金（總...

Registry SourceRecently Updated

4100cjlrestlong-ai

General

Story Long Scan

长篇网文扫榜。分析起点、番茄、晋江等平台排行榜数据，提炼市场趋势与热门题材。触发方式：/story-long-scan、/长篇扫榜、「长篇什么火」「起点排行」

Registry SourceRecently Updated

170worldwonderer