Category: provider
Model Studio Qwen TTS
Validation
mkdir -p output/alicloud-ai-audio-tts python -m py_compile skills/ai/audio/alicloud-ai-audio-tts/scripts/generate_tts.py && echo "py_compile_ok" > output/alicloud-ai-audio-tts/validate.txt
Pass criteria: command exits 0 and output/alicloud-ai-audio-tts/validate.txt is generated.
Output And Evidence
-
Save generated audio links, sample audio files, and request payloads to output/alicloud-ai-audio-tts/ .
-
Keep one validation log per execution.
Critical model names
Use one of the recommended models:
-
qwen3-tts-flash
-
qwen3-tts-instruct-flash
-
qwen3-tts-instruct-flash-2026-01-26
Prerequisites
- Install SDK (recommended in a venv to avoid PEP 668 limits):
python3 -m venv .venv . .venv/bin/activate python -m pip install dashscope
- Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials (env takes precedence).
Normalized interface (tts.generate)
Request
-
text (string, required)
-
voice (string, required)
-
language_type (string, optional; default Auto )
-
instruction (string, optional; recommended for instruct models)
-
stream (bool, optional; default false)
Response
-
audio_url (string, when stream=false)
-
audio_base64_pcm (string, when stream=true)
-
sample_rate (int, 24000)
-
format (string, wav or pcm depending on mode)
Quick start (Python + DashScope SDK)
import os import dashscope
Prefer env var for auth: export DASHSCOPE_API_KEY=...
Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].
Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line." response = dashscope.MultiModalConversation.call( model="qwen3-tts-instruct-flash", api_key=os.getenv("DASHSCOPE_API_KEY"), text=text, voice="Cherry", language_type="English", instruction="Warm and calm tone, slightly slower pace.", stream=False, )
audio_url = response.output.audio.url print(audio_url)
Streaming notes
-
stream=True returns Base64-encoded PCM chunks at 24kHz.
-
Decode chunks and play or concatenate to a pcm buffer.
-
The response contains finish_reason == "stop" when the stream ends.
Operational guidance
-
Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
-
Use language_type consistent with the text to improve pronunciation.
-
Use instruction only when you need explicit style/tone control.
-
Cache by (text, voice, language_type) to avoid repeat costs.
Output location
-
Default output: output/alicloud-ai-audio-tts/audio/
-
Override base dir with OUTPUT_DIR .
Workflow
-
Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
-
Run one minimal read-only query first to verify connectivity and permissions.
-
Execute the target operation with explicit parameters and bounded scope.
-
Verify results and save output/evidence files.
References
references/api_reference.md for parameter mapping and streaming example.
Realtime mode is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/ .
Voice cloning/design are provided by skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ and skills/ai/audio/alicloud-ai-audio-tts-voice-design/ .
Source list: references/sources.md