senseaudio-pronunciation-coach

Foreign language pronunciation coach — listen to standard TTS pronunciation, record yourself, get word-by-word feedback on what was wrong, then practice targeted drills. Use when users want to improve pronunciation, practice speaking a foreign language, or ask for "发音练习", "跟读", "纠音", "外语口语练习", "pronunciation practice", "how to pronounce", or any request to check or improve spoken language accuracy.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "senseaudio-pronunciation-coach" with this command: npx skills add scikkk/pronunciation

SenseAudio Pronunciation Coach

Listen → Record → Compare → Drill. The loop that actually improves pronunciation.

Step 1: Choose Practice Material

Three input modes:

A — Direct input: User pastes a word, phrase, or sentence.

B — Scene presets: Offer these if the user isn't sure what to practice:

SceneSample phrase
机场值机"I'd like a window seat, please."
餐厅点餐"Could I have the menu, please?"
商务会议"Let me walk you through the agenda."
酒店入住"I have a reservation under my name."
购物"Do you have this in a different size?"
问路"Excuse me, how do I get to the station?"

C — Topic-based: User says "练习 th 发音" or "练习 r 和 l 的区别" — generate 5 sentences targeting that phoneme.

Also ask: 目标语言? (default: English)

Step 2: Generate Standard Pronunciation

Produce two versions — slow for learning, normal for natural rhythm:

# Slow version (speed 0.75)
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"SenseAudio-TTS-1.0\",
    \"text\": \"<TEXT>\",
    \"stream\": false,
    \"voice_setting\": { \"voice_id\": \"<VOICE_ID>\", \"speed\": 0.75 },
    \"audio_setting\": { \"format\": \"mp3\" }
  }" -o slow.json
jq -r '.data.audio' slow.json | xxd -r -p > standard_slow.mp3

# Normal version (speed 1.0)
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"SenseAudio-TTS-1.0\",
    \"text\": \"<TEXT>\",
    \"stream\": false,
    \"voice_setting\": { \"voice_id\": \"<VOICE_ID>\", \"speed\": 1.0 },
    \"audio_setting\": { \"format\": \"mp3\" }
  }" -o normal.json
jq -r '.data.audio' normal.json | xxd -r -p > standard_normal.mp3

Voice selection by language:

  • English: female_0006_a (clear, neutral accent)
  • Chinese: female_0008_c (standard Mandarin)
  • Default: female_0006_a

Tell the user: "慢速版和正常速版已生成。先听慢速版,感受每个音的发音,再听正常版感受自然节奏。准备好后,录一段你的跟读发给我。"

Step 3: Transcribe User Recording

When the user uploads their recording:

curl -s -X POST https://api.senseaudio.cn/v1/audio/transcriptions \
  -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
  -F "file=@<USER_RECORDING>" \
  -F "model=sense-asr-pro" \
  -F "response_format=verbose_json" \
  -F "language=<LANGUAGE_CODE>" \
  -F "timestamp_granularities[]=word" \
  > asr_result.json

Language codes: English → en, Chinese → zh, Japanese → ja, French → fr, Spanish → es

Extract the transcript: jq -r '.text' asr_result.json

Step 4: Word-by-Word Comparison (LLM task)

Compare the ASR transcript against the original text yourself. Align words and identify mismatches:

Comparison approach:

  1. Tokenize both original and ASR output into words
  2. Use sequence alignment (like diff) to match them
  3. Flag words where ASR output differs from original

Diagnosis output format:

跟读分析:

✓ "I'd like a"  — 正确
✗ "window"      — 识别为 "winder"(可能是 -ow 结尾发音问题)
✓ "seat"        — 正确
✗ "please"      — 识别为 "pleas"(末尾 -z 音可能不够清晰)

准确率:3/5 词 (60%)

Common phoneme issues for Chinese speakers (English):

Misrecognized asLikely problemPhoneme
"free" for "three"th → f/θ/
"light" for "right"r → l confusion/r/
"wery" for "very"v → w/v/
"sit" for "seat"short vs long vowel/ɪ/ vs /iː/
"fink" for "think"th → f/θ/
dropped final consonantfinal stop deletion/t/, /d/, /k/

When a word is misrecognized, infer the likely phoneme issue and name it specifically.

Step 5: Targeted Drill

For each identified problem phoneme, generate a focused drill set:

Phoneme drill library:

PhonemeDrill words
/θ/ (th)think, three, through, both, weather, teeth, breathe
/r/red, right, road, very, sorry, around, mirror
/r/ vs /l/right/light, road/load, rice/lice, pray/play
/v/very, voice, love, live, over, never, river
/iː/ vs /ɪ/seat/sit, beat/bit, sheep/ship, feel/fill
final /t/cat, hat, right, night, about, what, that
final /d/road, said, good, food, bad, head

Present 3–5 drill words and generate slow TTS for each.

Step 6: Track Progress

Save session results to pronunciation_progress.json in the current directory:

{
  "sessions": [
    {
      "date": "<ISO date>",
      "text": "<practice text>",
      "accuracy": 0.6,
      "errors": ["window (/ow/)", "please (final /z/)"],
      "phonemes_drilled": ["/ow/", "/z/"]
    }
  ]
}

After 3+ sessions, show a summary:

发音弱项分析(最近5次练习):

/θ/ (th)  ████████░░  4次出错  ← 重点练习
/r/       ████░░░░░░  2次出错
/iː/      ██░░░░░░░░  1次出错

建议:重点练习 th 发音,可以说"把舌尖放在上下牙之间,轻轻吹气"。

Iteration

After each round, ask: "再来一遍,还是换一个句子?" Keep the loop going until the user is satisfied or accuracy reaches 90%+.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Grab Company

Provides detailed insights on Grab's evolution, business model, market position, and significance in Southeast Asia's digital economy and super app landscape.

Registry SourceRecently Updated
General

CV-Driven Job Hunter

Asiste en una búsqueda laboral proactiva basada en el CV del usuario — analiza perfil, sugiere banda salarial, escanea boards y career pages, califica matche...

Registry SourceRecently Updated
General

Changelog Linter

Validate CHANGELOG.md files against the Keep a Changelog format (keepachangelog.com). Checks version ordering, date formats, section types, link references,...

Registry SourceRecently Updated
General

Bosch Company

Bosch is the world's largest automotive Tier 1 supplier, focusing on automotive parts, industrial tech, consumer goods, and energy solutions with a foundatio...

Registry SourceRecently Updated