qwenspeak

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices, or work with TTS.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwenspeak" with this command: npx skills add psyb0t/qwenspeak

qwenspeak

YAML-driven text-to-speech over SSH using Qwen3-TTS models.

For installation and deployment, see references/setup.md.

SSH Wrapper

Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.

scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file

TTS Generation

Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.

# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml

# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}

# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"

# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"

# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav

YAML Structure

Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.

steps:
  - mode: custom-voice
    model_size: 1.7b
    speaker: Ryan
    language: English
    generate:
      - text: "Hello world"
        output: hello.wav
      - text: "I cannot believe this!"
        speaker: Vivian
        instruct: "Speak angrily"
        output: angry.wav

  - mode: voice-design
    generate:
      - text: "Welcome to our store."
        instruct: "A warm, friendly young female voice with a cheerful tone"
        output: welcome.wav

  - mode: voice-clone
    model_size: 1.7b
    ref_audio: ref.wav
    ref_text: "Transcript of reference"
    generate:
      - text: "First line in cloned voice"
        output: clone1.wav
      - text: "Second line"
        output: clone2.wav

Modes

custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.

voice-design — Describe the voice in natural language via instruct. 1.7B only.

voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.

Emotion trick for cloned voices

Upload references with different emotions, use separate steps:

scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav
steps:
  - mode: voice-clone
    ref_audio: refs/happy.wav
    ref_text: "transcript of happy ref"
    generate:
      - text: "Great news everyone!"
        output: happy1.wav

  - mode: voice-clone
    ref_audio: refs/angry.wav
    ref_text: "transcript of angry ref"
    generate:
      - text: "This is unacceptable"
        output: angry1.wav

Job Management

scripts/qwenspeak.sh "tts list-jobs"              # list all
scripts/qwenspeak.sh "tts list-jobs --json"        # JSON output
scripts/qwenspeak.sh "tts get-job <id>"            # job details
scripts/qwenspeak.sh "tts get-job-log <id>"        # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f"     # follow log
scripts/qwenspeak.sh "tts cancel-job <id>"         # cancel

Statuses: queuedrunningcompleted | failed | cancelled

Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).

File Operations

All paths relative to the work directory. Traversal blocked.

CommandDescription
put <path>Upload file from stdin
get <path>Download file to stdout
list-files [--json]List directory
remove-file <path>Delete a file
create-dir <path>Create directory
remove-dir <path>Remove empty directory
move-file <src> <dst>Move or rename
copy-file <src> <dst>Copy a file
file-exists <path>Check if file exists (true/false)
search-files <glob>Glob search (** recursive)

Speakers

SpeakerGenderLanguageDescription
VivianFemaleChineseBright, slightly edgy young voice
SerenaFemaleChineseWarm, gentle young voice
Uncle_FuMaleChineseSeasoned, low mellow timbre
DylanMaleChineseYouthful Beijing dialect, clear natural timbre
EricMaleChineseLively Chengdu/Sichuan dialect, slightly husky
RyanMaleEnglishDynamic with strong rhythmic drive
AidenMaleEnglishSunny American, clear midrange
Ono_AnnaFemaleJapanesePlayful, light nimble timbre
SoheeFemaleKoreanWarm with rich emotion

YAML Options

All settings cascade: global > step > generation.

FieldDefaultDescription
dtypefloat32float32, float16, bfloat16 (float16/bfloat16 GPU only)
flash_attnautoFlashAttention-2: auto-detects, auto-switches float32→bfloat16
temperature0.9Sampling temperature
top_k50Top-k sampling
top_p1.0Top-p / nucleus sampling
repetition_penalty1.05Repetition penalty
max_new_tokens2048Max codec tokens to generate
no_samplefalseGreedy decoding
streamingfalseStreaming mode (lower latency)
moderequiredStep only: custom-voice, voice-design, or voice-clone
model_size1.7bStep only: 1.7b or 0.6b
textrequiredText to synthesize
outputrequiredOutput file path
speakerViviancustom-voice: speaker name
languageAutoLanguage for synthesis
instruct-custom-voice: emotion/style; voice-design: voice description
ref_audio-voice-clone: reference audio file path
ref_text-voice-clone: transcript of reference audio
x_vector_onlyfalsevoice-clone: use speaker embedding only

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Cclaw

Open-source comedy AI + video editing + poster generation. Create standup/sketch/manzai/scripts, edit videos via FFmpeg, and generate comedy posters via canv...

Registry SourceRecently Updated
General

Bird Recognition Tool | 鸟类识别工具

Identifies bird species in images/videos of target areas. Supports recognition of no less than 500 common bird species, supports customized model training, s...

Registry SourceRecently Updated
General

Image Amazon Product Image Suite

A professional product image generation skill purpose-built for the Amazon e-commerce platform. Outputs comply with Amazon's image guidelines while optimizin...

Registry SourceRecently Updated
General

SearchOnlineAssets

Online asset search tool: queries public stock libraries (Pixabay) for high-quality photos, illustrations, vectors and videos, returning result metadata and...

Registry SourceRecently Updated