zhipu-tts

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "zhipu-tts" with this command: npx skills add franklu0819-lang/zhipu-tts

Zhipu AI Text-to-Speech

Convert Chinese text to natural-sounding speech using Zhipu AI's GLM-TTS model.

Setup

1. Get your API Key: Get a key from Zhipu AI Console

2. Set it in your environment:

export ZHIPU_API_KEY="your-key-here"

Available Voices

System Voices (Pre-built)

  • tongtong (彤彤) - Default voice, balanced tone
  • chuichui (锤锤) - Male voice, deeper tone
  • xiaochen (小陈) - Young professional voice
  • jam - 动动动物圈 Jam voice
  • kazi - 动动动物圈 Kazi voice
  • douji - 动动动物圈 Douji voice
  • luodo - 动动动物圈 Luodo voice

Usage

Basic Text-to-Speech

Convert text to speech with default settings (tongtong voice, normal speed, WAV format):

bash scripts/text_to_speech.sh "你好,今天天气怎么样"

Advanced Options

Specify voice, speed, format, and output filename:

bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav

Parameters:

  • text (required): Chinese text to convert (max 1024 characters)
  • voice (optional): tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodo
  • speed (optional): Speech speed from 0.5 to 2.0 (default: 1.0)
  • output_format (optional): wav (default), pcm
  • output_file (optional): Output filename (default: output.{format})

Voice Selection Guide

Choose tongtong (default) for:

  • General purpose narration
  • Professional presentations
  • Balanced tone requirements

Choose chuichui for:

  • Male voice needed
  • Deeper, authoritative tone
  • Documentary or formal content

Choose xiaochen for:

  • Young, energetic tone
  • Modern, casual content
  • Friendly assistant vibe

Choose jam/kazi/douji/luodo for:

  • Entertainment content
  • Character voices
  • Creative projects

Speed Control

Recommended speeds:

  • 0.8-1.0: Clear, professional narration
  • 1.0-1.2: Natural conversational pace (default: 1.0)
  • 1.2-1.5: Energetic, upbeat delivery
  • 1.5-2.0: Fast-paced summaries (may reduce clarity)

Output Formats

WAV (recommended):

  • Standard audio format
  • Widely compatible
  • Better quality preservation

PCM:

  • Raw audio format
  • Smaller file size
  • Requires additional processing for playback

Examples

Create a professional greeting:

bash scripts/text_to_speech.sh "您好,感谢致电智能客服,请按1选择中文服务" tongtong 1.0 wav greeting.wav

Generate an energetic announcement:

bash scripts/text_to_speech.sh "热烈欢迎各位嘉宾参加今天的活动!" xiaochen 1.3 wav announcement.wav

Create a calm narration:

bash scripts/text_to_speech.sh "在这个宁静的夜晚,让我们一起欣赏美丽的星空" chuichui 0.9 wav narration.wav

Character Limits

  • Maximum input: 1024 characters per request
  • For longer texts, split into multiple segments
  • Combine audio files post-generation

Audio Quality Tips

Best practices:

  • Use punctuation for natural pauses (commas, periods)
  • Break long sentences into shorter segments
  • Use appropriate line breaks for paragraph pauses
  • Test speed settings for your specific content

Sample rate: Generated audio uses 24000 Hz sampling rate for optimal quality.

Troubleshooting

Text Length Issues:

  • Split texts longer than 1024 characters
  • Process segments separately
  • Combine using audio editing tools

Audio Quality Issues:

  • Check text encoding (use UTF-8)
  • Verify punctuation placement
  • Adjust speed settings
  • Try different voices

File Playback Issues:

  • Ensure format compatibility with your player
  • WAV format works on most systems
  • PCM may require conversion

API Notes

  • Responses are returned as audio files
  • Watermarking enabled by default (can be disabled in account settings)
  • No strict rate limiting documented
  • Audio generation typically completes in 1-3 seconds

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Wangdongjie Cfo Skill

基于王东杰26年实战经验,提供A+H双市场IPO操盘、资本杠杆设计、业财融合和AI数字化风控咨询。

Registry SourceRecently Updated
General

Hk Stock Morning Report

Generate HK stock market morning report (股市晨報) for Chinese bank trading desk. Use when user asks "生成晨报", "股市晨报", "今日股市", "港股晨報", or any similar HK stock mark...

Registry SourceRecently Updated
General

Nansen Mpp Payment

Pay-per-call access to the Nansen API via MPP (Tempo). Use when a user wants anonymous Nansen access without an API key and without managing their own Base/S...

Registry SourceRecently Updated
General

Etsy Autolist

Auto-create and manage digital product listings on Etsy. Creates listings from existing digital product files (PDFs, templates, spreadsheets) using Etsy Open...

Registry SourceRecently Updated