smart-speak

Multilingual Text-to-Speech (TTS) with intelligent Pinyin-to-Hanzi conversion. Use when the user asks to generate audio for text that contains a mix of Vietnamese, Chinese (Pinyin), or English. This skill ensures correct pronunciation by converting Pinyin to Hanzi and using native-quality voices for each segment.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "smart-speak" with this command: npx skills add Jaskies/smart-speak-vutran

Smart-Speak: Multilingual TTS

This skill provides a high-quality, multilingual text-to-speech workflow that handles Vietnamese, Chinese (including Pinyin), and English seamlessly.

Core Features

  1. Intelligent Pinyin Conversion: Automatically converts Pinyin to Chinese characters (Hanzi) for more natural and accurate pronunciation by the Chinese TTS engine.
  2. Language Segmentation: Splits text into language-specific blocks to use specialized voices.
  3. Native-Quality Voices:
    • 🇻🇳 Vietnamese: vi-VN-HoaiMyNeural (Hoài Mỹ)
    • 🇨🇳 Chinese/Pinyin: zh-CN-XiaoxiaoNeural (Xiaoxiao)
    • 🇺🇸 English: en-US-AvaNeural (Ava)
  4. Audio Merging: Combines all generated segments into a single, high-quality MP3 file using ffmpeg.

Workflow

1. Analyze & Pre-process

Before generating audio, the agent must:

  • Detect Pinyin: Identify Pinyin within Vietnamese text. A word is likely Pinyin if it doesn't make sense in the Vietnamese context (e.g., Nǐ hǎo, , shēn tǐ).
  • Convert to Hanzi: Replace all detected Pinyin with the equivalent Chinese characters (e.g., Nǐ hǎo ma? -> 你好吗?). This ensures the zh-CN-XiaoxiaoNeural voice reads it with perfect tones.
  • Remove Emojis: Strip out all emojis from the text to prevent the TTS engine from reading them as descriptions.
  • Handle Punctuation: Ensure each segment ends with appropriate punctuation (commas, periods) to maintain natural pauses.

2. Segment the Text

Divide the processed text into blocks and assign the appropriate voice.

Example Input: "Chào anh Vũ, 你好吗? (Nǐ hǎo ma?) là câu chào."

Example Segments:

  1. {"text": "Chào anh Vũ, ", "voice": "vi-VN-HoaiMyNeural"}
  2. {"text": "你好吗?", "voice": "zh-CN-XiaoxiaoNeural"}
  3. {"text": " ( ", "voice": "vi-VN-HoaiMyNeural"}
  4. {"text": "你好吗?", "voice": "zh-CN-XiaoxiaoNeural"}
  5. {"text": " ) là câu chào.", "voice": "vi-VN-HoaiMyNeural"}

3. Execute the Synthesis

Use the bundled Python script to generate and merge the audio.

python3 skills/public/smart-speak/scripts/smart_speak.py \
  --segments-json '[{"text": "Chào anh Vũ, ", "voice": "vi-VN-HoaiMyNeural"}, ...]' \
  --output /home/jackie_chen_phong/.openclaw/workspace/output_name.mp3

4. Deliver the Audio

Send the resulting MP3 file to the user using the message tool (action=send, filePath).

Constraints

  • Absolute Paths: Always use the absolute path for the output file within the workspace: /home/jackie_chen_phong/.openclaw/workspace/.
  • JSON Encoding: Ensure the --segments-json string is properly escaped when passed to the shell.
  • TTS Location: The script assumes edge-tts is located at /home/jackie_chen_phong/.local/bin/edge-tts.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Xiaohongshu Crawler

爬取小红书内容,支持登录搜索、笔记详情获取、用户主页信息及热门笔记,无需登录部分功能可用。

Registry SourceRecently Updated
General

TAPD

当用户需要查询、修改 TAPD 项目中需求、缺陷、任务等信息时,如修改状态、添加评论等,通过调用 TAPD MCP 提供相应的服务。当用户要求时,通过 send_qiwei_message 发送消息到企业微信。

Registry SourceRecently Updated
General

Roast Generator

吐槽生成器。温和吐槽、毒舌模式、朋友互怼、名人吐槽、自嘲、Battle模式。Roast generator with gentle, savage modes. 吐槽、毒舌、搞笑。

Registry SourceRecently Updated
General

Unixtime

Quick Unix timestamp utility. Get current Unix time, convert timestamps to dates and back, show relative time (how long ago), and display time in different f...

Registry SourceRecently Updated