qwen3-tts-local-inference

Generate speech from text using Qwen3-TTS via direct Python inference — no server required. Use when: (1) converting text to speech / synthesising audio, (2) creating voiceovers or spoken content, (3) cloning a voice from reference audio, (4) generating TTS with built-in speakers or custom voice descriptions. Supports custom-voice (9 speakers), voice-design (natural language), and voice-clone (~3 s reference). Outputs .wav files. Both 0.6B (small, default) and 1.7B (large) models available. Runs entirely offline after model download.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen3-tts-local-inference" with this command: npx skills add jithinm/qwen3-tts-local-inference

Qwen3-TTS — Local Inference (No Server)

Run Qwen3-TTS directly in Python — no HTTP server, no REST API. Call a script or import the engine in your own code.

Quick reference

ModeWhat it doesKey args
custom-voice9 built-in speakers, optional emotion/style--speaker, --instruct
voice-designDescribe the voice in natural language--instruct (required)
voice-cloneClone from ~3 s reference audio--ref-audio, --ref-text

Available Speakers

The CustomVoice model includes 9 premium voices:

SpeakerLanguageDescription
VivianChineseBright, slightly edgy young female
SerenaChineseWarm, gentle young female
Uncle_FuChineseSeasoned male, low mellow timbre
DylanChinese (Beijing)Youthful Beijing male, clear
EricChinese (Sichuan)Lively Chengdu male, husky
RyanEnglishDynamic male, rhythmic
AidenEnglishSunny American male
Ono_AnnaJapanesePlayful female, light nimble
SoheeKoreanWarm female, rich emotion

Languages: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian, Auto


1 — Setup

Install dependencies once (from the skill directory):

First-time setup (one-time):

bash scripts/setup.sh

Custom download location:

python scripts/download_models.py --model-dir /path/to/models

Models are stored under {baseDir}/models/ by default. Override with QWEN_TTS_MODEL_DIR env var or --model-dir flag.


2 — Generate speech (CLI)

Custom Voice (default)

cd {baseDir}
python scripts/tts.py "Hello, how are you today?" --speaker Ryan --language English

With emotion/style instruction:

python scripts/tts.py "Great news everyone!" --speaker Aiden --instruct "cheerful and energetic"

Voice Design

Describe the voice in natural language:

python scripts/tts.py "Welcome to our show!" \
  --mode voice-design \
  --language English \
  --instruct "Warm, confident female voice in her 30s with a slight British accent"

Voice Clone

Clone a voice from a short (~3 s) reference audio clip:

python scripts/tts.py "This is spoken in the cloned voice." \
  --mode voice-clone \
  --language English \
  --ref-audio path/to/reference.wav \
  --ref-text "Transcript of the reference audio."

Common options

FlagPurpose
-o output.wavSave to exact file path instead of auto-named file
--output-dir DIROverride output directory (default: tts_output/)
--model-dir DIROverride model directory
--jsonPrint result as JSON
-vVerbose logging

3 — Python API

Use the engine directly in code:

import sys
sys.path.insert(0, "{baseDir}/scripts")

from inference import TTSInferenceEngine

engine = TTSInferenceEngine(
    model_dir="{baseDir}/models",   # optional, uses default if omitted
    output_dir="./tts_output",       # optional
)

result = engine.generate_custom_voice(
    text="Hello world!",
    language="English",
    speaker="Ryan",
    instruct="calm and professional",
)
print(result)
# {"file": "tts_output/custom_voice_20260218_...wav", "duration_s": 1.23, "inference_s": 4.56}

Available methods:

  • engine.generate_custom_voice(text, language, speaker, instruct)
  • engine.generate_voice_design(text, language, instruct)
  • engine.generate_voice_clone(text, language, ref_audio, ref_text)
  • engine.status() — returns loaded variant, device, paths

4 — Configuration

All settings are controlled via environment variables. Set them before running.

VariableDefaultDescription
QWEN_TTS_MODEL_SIZEsmallsmall (0.6B) or large (1.7B)
QWEN_TTS_MODEL_DIR{baseDir}/modelsWhere model weights are stored
QWEN_TTS_DEVICEauto (cuda:0 or cpu)Inference device
QWEN_TTS_DTYPEauto (bfloat16 / float32)Model precision
QWEN_TTS_OUTPUT_DIR./tts_outputWhere generated .wav files are saved

Switch to the 1.7B model:

set QWEN_TTS_MODEL_SIZE=large
python scripts/tts.py "Hello world"

Use a custom model directory:

set QWEN_TTS_MODEL_DIR=D:\my-models\qwen-tts
python scripts/tts.py "Hello world"

Important notes

  • Small model (0.6B) is the default. It uses less RAM and is faster. Switch to large (1.7B) for higher quality.
  • CPU inference is slow. Expect 30-120 s per sentence for the 1.7B model. The 0.6B model is roughly 2x faster.
  • Only one model variant is loaded at a time. Switching modes (e.g. custom-voice to voice-clone) triggers a model swap.
  • Output .wav files land in tts_output/ by default.
  • Models are downloaded to {baseDir}/models/ by default. Run download_models.py --size all to pre-download both sizes for offline use.
  • Voice Design mode has no 0.6B variant — it always uses the 1.7B model regardless of QWEN_TTS_MODEL_SIZE.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Worktree Manager

Manage Docker-based dev instances and git worktrees. Handles app container lifecycle, database seeding, and proxy route activation. Requires mysql-manager an...

Registry SourceRecently Updated
Coding

game-developer

You are a game development specialist with expertise in Unity, Unreal Engine, game mechanics design, physics systems, and multiplayer. Use when: game engine...

Registry SourceRecently Updated
Coding

Redis Manager

Manage a shared Redis Docker container for local dev environments. Handles container lifecycle, key inspection, and selective data flush. Joins the shared Do...

Registry SourceRecently Updated
Coding

Mysql Manager

Manage a shared MySQL Docker container for local dev environments. Handles container lifecycle, database creation/removal, and cross-instance data dumps. Des...

Registry SourceRecently Updated