Coze Voice Generation
Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) using coze-coding-dev-sdk.
Text-to-Speech (TTS)
Single Audio
npx ts-node {baseDir}/scripts/tts.ts --text "Hello, welcome to our service!"
With Different Voice
npx ts-node {baseDir}/scripts/tts.ts \
--text "This is a male voice" \
--speaker zh_male_m191_uranus_bigtts
Batch Generation
npx ts-node {baseDir}/scripts/tts.ts \
--texts "Chapter 1: Introduction" "Chapter 2: Getting Started" "Chapter 3: Advanced Topics" \
--speaker zh_female_xueayi_saturn_bigtts
With Custom Parameters
npx ts-node {baseDir}/scripts/tts.ts \
--text "Fast and loud announcement!" \
--speech-rate 30 \
--loudness-rate 20 \
--format mp3 \
--sample-rate 48000
TTS Options
| Option | Description |
|---|---|
--text <text> | Single text to synthesize |
--texts <texts...> | Multiple texts for batch generation |
--speaker <id> | Voice ID (default: zh_female_xiaohe_uranus_bigtts) |
--format <fmt> | mp3, pcm, ogg_opus (default: mp3) |
--sample-rate <hz> | 8000-48000 (default: 24000) |
--speech-rate <n> | -50 to 100 (default: 0) |
--loudness-rate <n> | -50 to 100 (default: 0) |
TTS Output
The script outputs audio URLs directly to stdout:
[1/1] Hello, welcome to our service!
https://example.com/generated-audio.mp3
Available Voices
General Purpose:
zh_female_xiaohe_uranus_bigtts- Xiaohe (default)zh_female_vv_uranus_bigtts- Vivi (Chinese & English)zh_male_m191_uranus_bigtts- Yunzhou (male)zh_male_taocheng_uranus_bigtts- Xiaotian (male)
Audiobook:
zh_female_xueayi_saturn_bigtts- Children's audiobook
Video Dubbing:
zh_male_dayi_saturn_bigtts- Dayi (male)zh_female_mizai_saturn_bigtts- Mizai (female)zh_female_jitangnv_saturn_bigtts- Motivational female
Role Playing:
saturn_zh_female_keainvsheng_tob- Cute girlsaturn_zh_male_shuanglangshaonian_tob- Cheerful boy
Speech-to-Text (ASR)
From URL
npx ts-node {baseDir}/scripts/asr.ts --url "https://example.com/audio.mp3"
From Local File
npx ts-node {baseDir}/scripts/asr.ts --file ./recording.mp3
ASR Options
| Option | Description |
|---|---|
--url <url> | Audio file URL |
--file <path> | Local audio file path |
ASR Output
Transcription is printed directly to stdout:
============================================================
TRANSCRIPTION
============================================================
Hello, this is the transcribed text from the audio file...
============================================================
Duration: 1m 30s
Segments: 5
ASR Requirements
- Duration: ≤ 2 hours
- File size: ≤ 100MB
- Formats: WAV, MP3, OGG OPUS, M4A
Notes
- Audio URLs have valid expiration - use directly when possible
- Speech rate: negative = slower, positive = faster
- Loudness rate: negative = quieter, positive = louder