smart-speak

Multilingual Text-to-Speech (TTS) with intelligent Pinyin-to-Hanzi conversion. Use when the user asks to generate audio for text that contains a mix of Vietnamese, Chinese (Pinyin), or English. This skill ensures correct pronunciation by converting Pinyin to Hanzi and using native-quality voices for each segment.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "smart-speak" with this command: npx skills add jaskies/smart-speak-vutran

Smart-Speak: Multilingual TTS

This skill provides a high-quality, multilingual text-to-speech workflow that handles Vietnamese, Chinese (including Pinyin), and English seamlessly.

Core Features

  1. Intelligent Pinyin Conversion: Automatically converts Pinyin to Chinese characters (Hanzi) for more natural and accurate pronunciation by the Chinese TTS engine.
  2. Language Segmentation: Splits text into language-specific blocks to use specialized voices.
  3. Native-Quality Voices:
    • 🇻🇳 Vietnamese: vi-VN-HoaiMyNeural (Hoài Mỹ)
    • 🇨🇳 Chinese/Pinyin: zh-CN-XiaoxiaoNeural (Xiaoxiao)
    • 🇺🇸 English: en-US-AvaNeural (Ava)
  4. Audio Merging: Combines all generated segments into a single, high-quality MP3 file using ffmpeg.

Workflow

1. Analyze & Pre-process

Before generating audio, the agent must:

  • Detect Pinyin: Identify Pinyin within Vietnamese text. A word is likely Pinyin if it doesn't make sense in the Vietnamese context (e.g., Nǐ hǎo, , shēn tǐ).
  • Convert to Hanzi: Replace all detected Pinyin with the equivalent Chinese characters (e.g., Nǐ hǎo ma? -> 你好吗?). This ensures the zh-CN-XiaoxiaoNeural voice reads it with perfect tones.
  • Remove Emojis: Strip out all emojis from the text to prevent the TTS engine from reading them as descriptions.
  • Handle Punctuation: Ensure each segment ends with appropriate punctuation (commas, periods) to maintain natural pauses.

2. Segment the Text

Divide the processed text into blocks and assign the appropriate voice.

Example Input: "Chào anh Vũ, 你好吗? (Nǐ hǎo ma?) là câu chào."

Example Segments:

  1. {"text": "Chào anh Vũ, ", "voice": "vi-VN-HoaiMyNeural"}
  2. {"text": "你好吗?", "voice": "zh-CN-XiaoxiaoNeural"}
  3. {"text": " ( ", "voice": "vi-VN-HoaiMyNeural"}
  4. {"text": "你好吗?", "voice": "zh-CN-XiaoxiaoNeural"}
  5. {"text": " ) là câu chào.", "voice": "vi-VN-HoaiMyNeural"}

3. Execute the Synthesis

Use the bundled Python script to generate and merge the audio.

python3 skills/public/smart-speak/scripts/smart_speak.py \
  --segments-json '[{"text": "Chào anh Vũ, ", "voice": "vi-VN-HoaiMyNeural"}, ...]' \
  --output /home/jackie_chen_phong/.openclaw/workspace/output_name.mp3

4. Deliver the Audio

Send the resulting MP3 file to the user using the message tool (action=send, filePath).

Constraints

  • Absolute Paths: Always use the absolute path for the output file within the workspace: /home/jackie_chen_phong/.openclaw/workspace/.
  • JSON Encoding: Ensure the --segments-json string is properly escaped when passed to the shell.
  • TTS Location: The script assumes edge-tts is located at /home/jackie_chen_phong/.local/bin/edge-tts.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Partnerize

Partnerize integration. Manage Organizations. Use when the user wants to interact with Partnerize data.

Registry SourceRecently Updated
General

Copper

Copper integration. Manage Persons, Organizations, Leads, Deals, Projects, Activities and more. Use when the user wants to interact with Copper data.

Registry SourceRecently Updated
General

Mattermost

Mattermost integration. Manage Teams. Use when the user wants to interact with Mattermost data.

Registry SourceRecently Updated
General

Parsioio

Parsio.io integration. Manage Documents, Parsers, Inboxs, Templates, Folders, Users. Use when the user wants to interact with Parsio.io data.

Registry SourceRecently Updated