voice-clone-tts

Voice cloning and TTS using MiniMax API. User must provide a voice name when cloning; after success, voice_name->voice_id is written back to this skill doc for reuse.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "voice-clone-tts" with this command: npx skills add SocketNet/voiceclone

Voice Clone + TTS

Scope

This skill is narrowly scoped to: (1) uploading clone audio to MiniMax, (2) creating a cloned voice, (3) TTS with cloned or existing voices, and (4) updating the cloned-voice mapping block in this SKILL.md. The script only reads/writes this skill’s SKILL.md; it does not read unrelated system files or other environment variables beyond the MiniMax API key(s) above.

When to Use

  • Use this skill when you need to clone user-provided audio into a reusable voice.
  • Use this skill when you need text-to-speech (TTS) with an already cloned voice.
  • For long-term maintenance, cloned results are written back to this file to reduce repeated setup.

API Reference (MiniMax)

  • Upload clone audio: POST /v1/files/upload, purpose=voice_clone, multipart/form-data.
  • Create cloned voice: POST /v1/voice_clone.
  • Speech synthesis: POST /v1/t2a_v2.
  • Audio requirements: mp3/m4a/wav, duration 10 seconds–5 minutes, file size <=20MB.

Install

  • Runtime: Python 3.7+.
  • Dependencies: The script uses the requests library. Install with:
    pip install -r requirements.txt
    
    or pip install requests.
  • Network: The script calls MiniMax APIs over HTTPS; it does not read unrelated system files. It only reads/writes this skill’s SKILL.md to update the cloned-voice mapping block.

Required environment variables (credentials)

At least one of the following must be set for MiniMax API authentication (see frontmatter requiredEnv / optionalEnv):

VariableRequiredNotes
MINIMAX_API_KEYpreferredPrimary API key
MINIMAX_KEYalternativeAccepted if set
MINIMAX_GROUP_API_KEYalternativeAccepted if set

The script will fail with a clear error if none are set.

Prerequisites

  1. Credentials: Set one of the env vars above (see “Required environment variables”).
  2. Prepare clone audio (format/duration/size limits above).
  3. Before cloning, confirm the voice name (voice_name) with the user, e.g. liuyang_narration_v1.

Usage

  1. Go to the skill directory: cd workspace/skills/voice-clone-tts
  2. Run the script (clone + synthesize):
python scripts/minimax_voice_clone_tts.py \
  --audio "/absolute/path/to/voice.wav" \
  --voice-name "yangtuo_demo_v1" \
  --display-name "Alpaca Demo" \
  --text "Hello, this is a cloned voice test." \
  --output "./output/voice_test.mp3"
  1. To clone only (no synthesis), omit --text.
  2. To synthesize only (by display name or voice_id):
# Resolve by display name
python scripts/minimax_voice_clone_tts.py \
  --voice "voice_v2" \
  --text "This is TTS using an existing cloned voice." \
  --output "./output/reuse_voice.mp3"

# Or specify voice_id directly
python scripts/minimax_voice_clone_tts.py \
  --voice-id "yangtuo_demo_v1" \
  --text "This is TTS using an existing cloned voice." \
  --output "./output/reuse_voice.mp3"

Common Options

  • --audio: Path to clone audio (required for cloning).
  • --voice-name: Required when cloning; API voice ID (letters, digits, underscores, e.g. yangtuo_demo_v1).
  • --display-name: Optional when cloning; display name written to SKILL (e.g. Alpaca Demo). Defaults to --voice-name if omitted.
  • --voice-id: For synthesis, specify API voice_id directly (skips mapping table).
  • --voice: For synthesis, specify display name or voice_id; resolved from the mapping table below (e.g. voice_v2 or yangtuo_demo_v1).
  • --text: Text to synthesize (omit for clone-only).
  • --output: Output audio path (default ./output/minimax_tts.mp3).
  • --model: Speech model (default speech-2.8-turbo).
  • --format: Output format (mp3/pcm/flac/wav).
  • --speed --vol --pitch --emotion: Speech expression parameters.

Write-Back (Important)

  • After a successful clone, the script writes display name → voice_id to the “Cloned Voice Mapping” section below.
  • Use a display name with --voice "display name" so you don't need to remember voice_id.

Cloned Voice Mapping

  • Left: display name; right: API voice_id. For TTS use --voice "display name" or --voice-id voice_id.
<!-- CLONED_VOICES:START -->
  • test_voice_1772187110: test_voice_1772187110 (updated: 2026-02-27 18:12:00)
  • voice_v1: shuangyue_test (updated: 2026-02-28 16:47:01)
  • voice_v2: yangtuo_demo_v1 (updated: 2026-02-27 18:19:39)
  • voice_v3: dong_yuhui_voice_v1 (updated: 2026-03-02 19:51:44)
<!-- CLONED_VOICES:END -->

Troubleshooting

  • 401 / auth failure: Check that MINIMAX_API_KEY is correct.
  • Parameter errors: Check voice_name rules, audio format/size, and text length.
  • Clone succeeded but not written back: Ensure SKILL.md exists and contains the write-back marker block.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Charging Ledger

充电记录账本 - 从截图提取充电信息并记录,支持按周、月查询汇总。**快速暗号**: 充电记录、充电账本、充电汇总。**自然触发**: 记录充电、查询充电费用、充电统计。

Registry SourceRecently Updated
General

qg-skill-sync

从团队 Git 仓库同步最新技能到本机 OpenClaw。支持首次设置、定时自动更新、手动同步和卸载。当用户需要同步技能、设置技能同步、安装或更新团队技能,或提到「技能同步」「同步技能」时使用。

Registry SourceRecently Updated
General

Ad Manager

广告投放管理 - 自动管理广告投放、优化ROI、生成报告。适合:营销人员、电商运营。

Registry SourceRecently Updated