qwen-tts

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen-tts" with this command: npx skills add paki81/qwen-tts/paki81-qwen-tts-qwen-tts

Qwen TTS

Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice model.

Quick Start

Generate speech from text:

scripts/tts.py "Ciao, come va?" -l Italian -o output.wav

With voice instruction (emotion/style):

scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav

Installation

First-time setup (one-time):

cd skills/public/qwen-tts
bash scripts/setup.sh

This creates a local virtual environment and installs qwen-tts package (~500MB).

Note: First synthesis downloads ~1.7GB model from Hugging Face automatically.

Usage

scripts/tts.py [options] "Text to speak"

Options

  • -o, --output PATH - Output file path (default: qwen_output.wav)
  • -s, --speaker NAME - Speaker voice (default: Vivian)
  • -l, --language LANG - Language (default: Auto)
  • -i, --instruct TEXT - Voice instruction (emotion, style, tone)
  • --list-speakers - Show available speakers
  • --model NAME - Model name (default: CustomVoice 1.7B)

Examples

Basic Italian speech:

scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav

With emotion/instruction:

scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav

List available speakers:

scripts/tts.py --list-speakers

Available Speakers

The CustomVoice model includes 9 premium voices:

SpeakerLanguageDescription
VivianChineseBright, slightly edgy young female
SerenaChineseWarm, gentle young female
Uncle_FuChineseSeasoned male, low mellow timbre
DylanChinese (Beijing)Youthful Beijing male, clear
EricChinese (Sichuan)Lively Chengdu male, husky
RyanEnglishDynamic male, rhythmic
AidenEnglishSunny American male
Ono_AnnaJapanesePlayful female, light nimble
SoheeKoreanWarm female, rich emotion

Recommendation: Use each speaker's native language for best quality, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).

Voice Instructions

Use -i, --instruct to control emotion, tone, and style:

Italian examples:

  • "Parla con entusiasmo"
  • "Tono serio e professionale"
  • "Voce calma e rilassante"
  • "Leggi come un narratore"

English examples:

  • "Speak with excitement"
  • "Very happy and energetic"
  • "Calm and soothing voice"
  • "Read like a narrator"

Integration with OpenClaw

The script outputs the audio file path to stdout (last line), making it compatible with OpenClaw's TTS workflow:

# OpenClaw captures the output path
cd skills/public/qwen-tts
OUTPUT=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null)
# OUTPUT = /tmp/audio.wav

Performance

  • GPU (CUDA): ~1-3 seconds for short phrases
  • CPU: ~10-30 seconds for short phrases
  • Model size: ~1.7GB (auto-downloads on first run)
  • Venv size: ~500MB (installed dependencies)

Troubleshooting

Setup fails:

# Ensure Python 3.10-3.12 is available
python3.12 --version

# Re-run setup
cd skills/public/qwen-tts
rm -rf venv
bash scripts/setup.sh

Model download slow/fails:

# Use mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com
scripts/tts.py "Test" -o test.wav

Out of memory (GPU): The model automatically falls back to CPU if GPU memory insufficient.

Audio quality issues:

  • Try different speaker: --list-speakers
  • Add instruction: -i "Speak clearly and slowly"
  • Check language matches text: -l Italian for Italian text

Model Details

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

qwen-tts

No summary provided by upstream source.

Repository SourceNeeds Review
General

qwen-tts-voice-cloning

No summary provided by upstream source.

Repository SourceNeeds Review
General

51mee Resume Parse

简历解析。触发场景:用户上传简历文件要求解析、提取结构化信息。

Registry SourceRecently Updated
General

51mee Resume Match

人岗匹配。触发场景:用户要求匹配简历和职位;用户问这个候选人适合这个职位吗;用户要筛选最匹配的候选人。

Registry SourceRecently Updated