Voice Clone + TTS
Scope
This skill is narrowly scoped to: (1) uploading clone audio to MiniMax, (2) creating a cloned voice, (3) TTS with cloned or existing voices, and (4) updating the cloned-voice mapping block in this SKILL.md. The script only reads/writes this skill’s SKILL.md; it does not read unrelated system files or other environment variables beyond the MiniMax API key(s) above.
When to Use
- Use this skill when you need to clone user-provided audio into a reusable voice.
- Use this skill when you need text-to-speech (TTS) with an already cloned voice.
- For long-term maintenance, cloned results are written back to this file to reduce repeated setup.
API Reference (MiniMax)
- Upload clone audio:
POST /v1/files/upload,purpose=voice_clone,multipart/form-data. - Create cloned voice:
POST /v1/voice_clone. - Speech synthesis:
POST /v1/t2a_v2. - Audio requirements:
mp3/m4a/wav, duration10 seconds–5 minutes, file size<=20MB.
Install
- Runtime: Python 3.7+.
- Dependencies: The script uses the
requestslibrary. Install with:
orpip install -r requirements.txtpip install requests. - Network: The script calls MiniMax APIs over HTTPS; it does not read unrelated system files. It only reads/writes this skill’s
SKILL.mdto update the cloned-voice mapping block.
Required environment variables (credentials)
At least one of the following must be set for MiniMax API authentication (see frontmatter requiredEnv / optionalEnv):
| Variable | Required | Notes |
|---|---|---|
MINIMAX_API_KEY | preferred | Primary API key |
MINIMAX_KEY | alternative | Accepted if set |
MINIMAX_GROUP_API_KEY | alternative | Accepted if set |
The script will fail with a clear error if none are set.
Prerequisites
- Credentials: Set one of the env vars above (see “Required environment variables”).
- Prepare clone audio (format/duration/size limits above).
- Before cloning, confirm the voice name (voice_name) with the user, e.g.
liuyang_narration_v1.
Usage
- Go to the skill directory:
cd workspace/skills/voice-clone-tts - Run the script (clone + synthesize):
python scripts/minimax_voice_clone_tts.py \
--audio "/absolute/path/to/voice.wav" \
--voice-name "yangtuo_demo_v1" \
--display-name "Alpaca Demo" \
--text "Hello, this is a cloned voice test." \
--output "./output/voice_test.mp3"
- To clone only (no synthesis), omit
--text. - To synthesize only (by display name or voice_id):
# Resolve by display name
python scripts/minimax_voice_clone_tts.py \
--voice "voice_v2" \
--text "This is TTS using an existing cloned voice." \
--output "./output/reuse_voice.mp3"
# Or specify voice_id directly
python scripts/minimax_voice_clone_tts.py \
--voice-id "yangtuo_demo_v1" \
--text "This is TTS using an existing cloned voice." \
--output "./output/reuse_voice.mp3"
Common Options
--audio: Path to clone audio (required for cloning).--voice-name: Required when cloning; API voice ID (letters, digits, underscores, e.g.yangtuo_demo_v1).--display-name: Optional when cloning; display name written to SKILL (e.g.Alpaca Demo). Defaults to--voice-nameif omitted.--voice-id: For synthesis, specify API voice_id directly (skips mapping table).--voice: For synthesis, specify display name or voice_id; resolved from the mapping table below (e.g.voice_v2oryangtuo_demo_v1).--text: Text to synthesize (omit for clone-only).--output: Output audio path (default./output/minimax_tts.mp3).--model: Speech model (defaultspeech-2.8-turbo).--format: Output format (mp3/pcm/flac/wav).--speed --vol --pitch --emotion: Speech expression parameters.
Write-Back (Important)
- After a successful clone, the script writes display name → voice_id to the “Cloned Voice Mapping” section below.
- Use a display name with
--voice "display name"so you don't need to remember voice_id.
Cloned Voice Mapping
- Left: display name; right: API voice_id. For TTS use
--voice "display name"or--voice-id voice_id.
test_voice_1772187110:test_voice_1772187110(updated: 2026-02-27 18:12:00)voice_v1:shuangyue_test(updated: 2026-02-28 16:47:01)voice_v2:yangtuo_demo_v1(updated: 2026-02-27 18:19:39)voice_v3:dong_yuhui_voice_v1(updated: 2026-03-02 19:51:44)
Troubleshooting
- 401 / auth failure: Check that
MINIMAX_API_KEYis correct. - Parameter errors: Check
voice_namerules, audio format/size, and text length. - Clone succeeded but not written back: Ensure
SKILL.mdexists and contains the write-back marker block.