mimo-tts

Text-to-speech using Xiaomi MiMo TTS API. Generates WAV audio files. Triggers when user says "send voice message", "voice reply", "read to me", "use clip voice", or any TTS-related request. Supports style control and fine-grained audio tags.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mimo-tts" with this command: npx skills add whmmy/mimo-tts

Xiaomi MiMo TTS

Quick Usage

Just say "send voice" + what you want me to say, or describe the voice style you want.

Default Config

  • Default Voice: default_zh (Chinese female)
  • Default Style: <style>夹子音</style> (cute/clip voice, used when no style specified)

Available Voices

Voice Namevoice parameter
MiMo-Defaultmimo_default
MiMo-Chinese-Femaledefault_zh
MiMo-English-Femaledefault_eh

Style Control

Overall Style (at the beginning of text)

Style TypeExamples
Speed Control变快 (faster) / 变慢 (slower)
Emotion开心 (happy) / 悲伤 (sad) / 生气 (angry)
Character孙悟空 (Wukong) / 林黛玉 (Lin Daiyu)
Style Variations悄悄话 (whisper) / 夹子音 (clip voice) / 台湾腔 (Taiwanese accent)
Dialect东北话 (Northeast) / 四川话 (Sichuan) / 河南话 (Henan) / 粤语 (Cantonese)

Format: <style>style1 style2</style>text to synthesize

Audio Tags (Fine-grained Control)

Use () to annotate emotion, speed, pauses, breathing, etc:

TagDescriptionExample
(紧张,深呼吸)Multi-emotion combo(紧张,深呼吸)呼……冷静,冷静
(语速加快)Speed change(语速加快,碎碎念)
(小声)Volume control(小声)哎呀,领带歪没歪?
(长叹一口气)Sigh(长叹一口气)
(咳嗽)Cough(咳嗽)简直能把人骨头冻透了
(沉默片刻)Pause(沉默片刻)
(苦笑)Bitter smile(苦笑)呵,没如果了
(提高音量喊话)Loud shout(提高音量喊话)大姐!这鱼新鲜着呢!
(极其疲惫,有气无力)Exhausted师傅……到地方了叫我一声……
(寒冷导致的急促呼吸)Environmental呼——呼——这、这大兴安岭的雪……

Synthesis Example:

import os
import base64
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1"
)

# Clip voice style
text = "<style>夹子音</style>主人~我来啦!今天有什么需要帮忙的吗~"

completion = client.chat.completions.create(
    model="mimo-v2-tts",
    messages=[
        {"role": "user", "content": "你好"},
        {"role": "assistant", "content": text}
    ],
    audio={"format": "wav", "voice": "default_zh"}
)

audio_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("output.wav", "wb") as f:
    f.write(audio_bytes)

Notes

  • Target text must be in the assistant role message, not in user
  • <style> tag must be at the beginning of target text
  • For singing: <style>唱歌</style>target text
  • Returns base64-encoded WAV audio

Script Usage

Use scripts/mimo_tts.py for speech synthesis:

MIMO_API_KEY=your_api_key python3 scripts/mimo_tts.py "text to synthesize" --voice default_zh --style "夹子音" --output output.wav

Note: Set MIMO_API_KEY environment variable or configure in OpenClaw settings.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Phone agent

Use AI AutoGLM Phone Agent for automated mobile device control. Suitable for tasks requiring mobile phone automation, such as APP automated testing, data col...

Registry SourceRecently Updated
Coding

Hk LeETF

Analyzes Hong Kong leveraged ETFs by evaluating holdings, price deviation, liquidity risks, and provides rebalancing timing, arbitrage opportunities, and ris...

Registry SourceRecently Updated
Coding

client-onboard

Generate a complete client onboarding package from a project description. Creates project brief, tech stack, milestones, folder structure, CLAUDE.md, and a r...

Registry SourceRecently Updated
Coding

workflow-migrate

Migrate N8N/Zapier/Make workflows to production-grade Python or Node.js scripts. Given a workflow description or paste, rewrites automation logic with retry,...

Registry SourceRecently Updated