Coze Text-to-Speech (TTS)

Convert text to natural-sounding speech using Coze API.

Setup

1. Get your API Key: Get a key from Coze Platform

2. Set it in your environment:

export COZE_API_KEY="your-key-here"

Supported Output Formats

MP3 - Default format, widely compatible
OGG_OPUS - Optimized for streaming and messaging
WAV - Uncompressed audio
PCM - Raw audio data

Usage

Basic TTS

Convert text to speech with default settings:

bash scripts/text_to_speech.sh "你好，这是测试语音"

Save to Specific File

bash scripts/text_to_speech.sh "你好世界" -o output.mp3

Use Different Voice

bash scripts/text_to_speech.sh "你好" -v 2

Change Output Format

bash scripts/text_to_speech.sh "你好" -f ogg_opus

Full Options

bash scripts/text_to_speech.sh "要转换的文本" -o output.mp3 -v 1 -f mp3

Parameters:

text (required): Text to convert to speech
-o, --output (optional): Output file path (default: auto-generated)
-v, --voice (optional): Voice ID (default: 1)
-f, --format (optional): Output format - mp3/ogg_opus/wav/pcm (default: mp3)

Output

The script saves the audio file and outputs:

File path
File size
Audio duration (if ffprobe is available)

Example output:

✓ Audio saved: coze_tts_20260324_235030_a1b2c3d4.mp3
  Size: 25.3 KB
  Duration: ~3 seconds

Workflow Examples

Generate Notification Audio

bash scripts/text_to_speech.sh "您有一条新消息" -o notification.mp3

Create Voice Greeting

bash scripts/text_to_speech.sh "欢迎使用 Coze 语音服务" -v 2 -o greeting.mp3

Generate OGG for Messaging

bash scripts/text_to_speech.sh "你好" -f ogg_opus -o message.ogg

Batch Generate

for text in "你好" "谢谢" "再见"; do
    bash scripts/text_to_speech.sh "$text" -o "${text}.mp3"
done

Integration with Other Skills

Combine with coze-asr for voice conversation:

# 1. User speaks -> ASR converts to text
bash coze-asr/scripts/speech_to_text.sh input.ogg

# 2. Process text with AI...

# 3. AI response -> TTS converts to speech
bash coze-tts/scripts/text_to_speech.sh "AI的回复" -o response.mp3

Troubleshooting

Authentication Error:

Check COZE_API_KEY is set correctly
Verify API key has TTS permissions

Invalid Voice ID:

Voice ID should be a number (int64 format)
Try voice_id: 1 as default

File Not Created:

Check write permissions in output directory
Ensure sufficient disk space

Limitations

Text length limits apply (check Coze documentation)
Rate limits may apply based on your plan
Some voices may not support all output formats

API Reference

Endpoint: POST https://api.coze.cn/v1/audio/speech
Authentication: Bearer token (COZE_API_KEY)
Content-Type: application/json

Required Environment Variables

Variable	Description	Required
`COZE_API_KEY`	Coze API authentication key	Yes

Required Tools

Tool	Purpose	Required
`jq`	JSON processing	Yes
`ffprobe`	Audio duration detection	Optional

License

MIT

coze-tts

Safety Notice

Copy this and send it to your AI assistant to learn