qwen-tts

Text-to-speech using Qwen3-TTS CustomVoice MLX model. Supports 9 speakers and multiple emotion/style instructions. Optimized for Apple Silicon. Use when user wants audio speech output.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen-tts" with this command: npx skills add stvlynn/skills/stvlynn-skills-qwen-tts

Qwen TTS Skill

Text-to-speech using Qwen3-TTS CustomVoice model, running locally on Apple Silicon via MLX.

Overview

  • Model: mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit (4-bit quantized, ~600MB)
  • Runtime: MLX (Apple Silicon GPU acceleration)
  • Speakers: 9 built-in voices
  • Output: WAV audio files
  • Auto-cleanup: Files older than 24 hours are removed automatically

First-time Deployment

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.10+

1. Create virtual environment

cd /path/to/skills/skills/qwen-tts
python3 -m venv venv
source venv/bin/activate

2. Install dependencies

模型从 ModelScope 镜像下载(国内更快):

pip install -r scripts/requirements.txt

3. Pre-download model (optional)

首次运行时会自动下载模型。如需提前下载:

source venv/bin/activate
export HF_ENDPOINT="https://hf-mirror.com"
python3 -c "from mlx_audio.tts.utils import load_model; load_model('mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit')"

4. Verify

source venv/bin/activate
python3 scripts/tts.py "你好,这是一段测试语音。" --output /tmp
# Should produce a WAV file in /tmp/

Troubleshooting

问题解决方法
ModuleNotFoundError: mlx确认使用 Apple Silicon Mac,MLX 不支持 Intel Mac
模型下载缓慢设置 export HF_ENDPOINT="https://hf-mirror.com"
内存不足4-bit 模型约需 1.5GB 内存,关闭其他大型应用
无声音输出检查输出文件是否为 0 字节,可能是文本过短

Configuration

⚠️ 以下为示例默认值。请根据实际使用场景修改 speaker 和 instruct。

  • Speaker: Serena(示例)
  • Instruct: 撒娇语气(示例)
  • Language: Chinese
  • Speed: 1.0

Available Speakers

SpeakerLanguage
SerenaChinese
VivianChinese
Uncle_FuChinese
EricChinese
DylanChinese
RyanEnglish
AidenEnglish
Ono_AnnaJapanese
SoheeKorean

Available Instructs (emotion/style)

  • 撒娇语气 — coquettish
  • 冷静分析 — calm analysis
  • 惊讶 — surprised
  • 兴奋 — excited
  • 神秘 — mysterious
  • 开心 — happy
  • 委屈 — wronged/sad

Also supports free-form natural language instructions, e.g. 用特别愤怒的语气说.

Usage

# Default settings
python3 scripts/tts.py "你好!"

# Custom speaker
python3 scripts/tts.py "Hello!" --speaker Ryan --language English

# Custom emotion
python3 scripts/tts.py "其实我真的有发现..." --instruct 冷静分析

# Full customization
python3 scripts/tts.py "哥哥,你回来啦!" \
  --speaker Serena \
  --instruct 撒娇语气 \
  --speed 1.0

# Custom output directory
python3 scripts/tts.py "测试" --output /tmp

# Skip auto-cleanup of old files
python3 scripts/tts.py "测试" --no-cleanup

Audio Output

  • Default directory: ~/tts-output/ (override with $QWEN_TTS_OUTPUT_DIR)
  • File naming: tts_{timestamp}_{index}.wav
  • Auto-cleanup: Files older than 24 hours removed on each run (disable with --no-cleanup)

Integration

  1. Generate audio using the TTS script
  2. Send the audio file as a voice message (Telegram, etc.)
  3. Old files are cleaned up automatically

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

create-sticker

No summary provided by upstream source.

Repository SourceNeeds Review
General

searxng

No summary provided by upstream source.

Repository SourceNeeds Review
General

qwen-asr

No summary provided by upstream source.

Repository SourceNeeds Review
General

tsticker

No summary provided by upstream source.

Repository SourceNeeds Review