zhipu-asr

Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio transcription with context prompts, custom hotwords, and multiple audio formats.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "zhipu-asr" with this command: npx skills add franklu0819-lang/zhipu-asr

Zhipu AI Automatic Speech Recognition (ASR)

Transcribe Chinese audio files to text using Zhipu AI's GLM-ASR model.

Setup

1. Get your API Key: Get a key from Zhipu AI Console

2. Set it in your environment:

export ZHIPU_API_KEY="your-key-here"

Supported Audio Formats

  • WAV - Recommended, best quality
  • MP3 - Widely supported
  • OGG - Auto-converted to MP3
  • M4A - Auto-converted to MP3
  • AAC - Auto-converted to MP3
  • FLAC - Auto-converted to MP3
  • WMA - Auto-converted to MP3

Note: The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.

File Constraints

  • Maximum file size: 25 MB
  • Maximum duration: 30 seconds
  • Recommended sample rate: 16000 Hz or higher
  • Audio channels: Mono or stereo

Usage

Basic Transcription

Transcribe an audio file with default settings:

bash scripts/speech_to_text.sh recording.wav

Transcription with Context

Provide previous transcription or context for better accuracy:

bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容,有助于提高准确性"

Transcription with Hotwords

Use custom vocabulary to improve recognition of specific terms:

bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"

Full Options

Combine context and hotwords:

bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"

Parameters:

  • audio_file (required): Path to audio file (.wav or .mp3)
  • prompt (optional): Previous transcription or context text (max 8000 chars)
  • hotwords (optional): Comma-separated list of specific terms (max 100 words)

Features

Context Prompts

Why use context prompts:

  • Improves accuracy in long conversations
  • Helps with domain-specific terminology
  • Maintains consistency across multiple segments

When to use:

  • Multi-part conversations or meetings
  • Technical or specialized content
  • Continuing from previous transcriptions

Example:

bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容:讨论了项目进展和下一步计划"

Hotwords

What are hotwords: Custom vocabulary list that boosts recognition accuracy for specific terms.

Best use cases:

  • Proper names (people, places)
  • Domain-specific terminology
  • Company names and products
  • Technical jargon
  • Industry-specific terms

Examples:

# Medical transcription
bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案"

# Business meeting
bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"

# Tech discussion
bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"

Workflow Examples

Transcribe a Meeting

# Part 1
bash scripts/speech_to_text.sh meeting_part1.wav

# Part 2 with context
bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"

# Part 3 with context
bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"

Transcribe a Lecture

bash scripts/speech_to_text.sh lecture.wav "" "教授,课程名称,专业术语1,专业术语2"

Process Multiple Files

for file in recording_*.wav; do
    bash scripts/speech_to_text.sh "$file"
done

Audio Quality Tips

Best practices for accurate transcription:

  1. Clear audio source

    • Minimize background noise
    • Use good quality microphone
    • Speak clearly and at moderate pace
  2. Optimal audio settings

    • Sample rate: 16000 Hz or higher
    • Bit depth: 16-bit or higher
    • Single channel (mono) is sufficient
  3. File preparation

    • Remove silence from beginning/end
    • Normalize audio levels
    • Ensure consistent volume

Output Format

The script outputs JSON with:

  • id: Task ID
  • created: Request timestamp (Unix timestamp)
  • request_id: Unique request identifier
  • model: Model name used
  • text: Transcribed text

Example output:

{
  "id": "task-12345",
  "created": 1234567890,
  "request_id": "req-abc123",
  "model": "glm-asr-2512",
  "text": "你好,这是转录的文本内容"
}

Troubleshooting

File Size Issues:

  • Split audio files larger than 25 MB
  • Reduce sample rate or bit depth
  • Use compression (MP3) for smaller files

Duration Issues:

  • Split recordings longer than 30 seconds
  • Process segments separately
  • Use context prompts to maintain continuity

Poor Accuracy:

  • Improve audio quality
  • Use hotwords for specific terms
  • Provide context prompts
  • Ensure clear speech and minimal noise

Format Issues:

  • Ensure file is .wav or .mp3
  • Check file is not corrupted
  • Verify audio can be played by standard players

Limitations

  • Maximum audio duration: 30 seconds per request
  • File size limit: 25 MB
  • Maximum hotwords: 100 terms
  • Context prompt limit: 8000 characters
  • Best performance with Chinese language audio

Performance Notes

  • Typical transcription time: 1-3 seconds
  • Real-time or faster for most audio
  • Processing time scales with audio quality and length

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

aiq-assessment

AIQ(AI商数)评估工具。基于"三层18原子能力"框架,对个人或团队的AI协作效能进行 结构化评估。包含提示素养、算法共情、判断锐度三大维度,覆盖18个可量化的原子能力。 适用于自我诊断、招聘评估、团队AI成熟度审计等场景。触发词:AIQ、AI商数、AI能力评估、 AI协作效能、会不会用AI、AI Native...

Registry SourceRecently Updated
General

Reptile Pet Health Diagnosis Tool | 爬行类宠物健康诊断分析工具

Analyzes uploaded reptile or arachnid videos to identify scale, skin, and body issues, then generates a detailed health diagnosis report.

Registry SourceRecently Updated
General

Whop Digital Sales

Auto-create and manage digital products on Whop.com. Manages product lifecycle from creation to checkout link generation. Uses Whop REST API v1 with Company...

Registry SourceRecently Updated
General

ClawCap

Spending cap proxy for OpenClaw. Enforce hard daily and monthly limits across all your AI models (Claude, GPT, Gemini, and more) under one cap. Stop runaway...

Registry SourceRecently Updated