voice-to-text

Convert voice messages and audio files to text using Vosk offline speech recognition. Use when a user sends a voice message, audio file, or asks to transcribe speech to text.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "voice-to-text" with this command: npx skills add vae999/voice-to-text

Voice to Text

Convert voice messages and audio files to text using Vosk, an offline speech recognition toolkit.

Setup

  1. Install dependencies:

    # macOS
    brew install ffmpeg
    pip install vosk
    
    # Linux
    apt-get install ffmpeg
    pip install vosk
    
  2. Download a Vosk model:

    mkdir -p ~/.vosk/models && cd ~/.vosk/models
    
    # Chinese (small, fast)
    curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip
    unzip vosk-model-small-cn-0.22.zip
    
    # English (small)
    curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
    unzip vosk-model-small-en-us-0.15.zip
    

Usage

When the user provides a voice message or audio file path, run the transcription:

python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

For specific model selection, set the environment variable:

VOSK_MODEL_PATH=~/.vosk/models/vosk-model-cn-0.22 python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

Supported Audio Formats

  • MP3, WAV, M4A, OGG, FLAC, AAC, WEBM
  • Voice messages from WeChat, Telegram, WhatsApp, etc.

Available Models

ModelLanguageSizeNotes
vosk-model-small-cn-0.22Chinese42MFast, good accuracy
vosk-model-cn-0.22Chinese1.3GHigh accuracy
vosk-model-small-en-us-0.15English40MFast, good accuracy
vosk-model-en-us-0.22English1.8GHigh accuracy

Download models from: https://alphacephei.com/vosk/models

Example Workflow

  1. User sends a voice message via WeChat/Telegram
  2. OpenClaw receives the audio file
  3. Run: python3 transcribe.py /path/to/voice.ogg
  4. Return transcribed text to user

Troubleshooting

  • No model found: Download a model to ~/.vosk/models/
  • ffmpeg not found: Install via brew install ffmpeg or apt install ffmpeg
  • Poor accuracy: Try a larger model for better results

Notes

  • Works completely offline after model download
  • Supports multiple languages (download appropriate model)
  • Audio is converted to 16kHz mono WAV for processing

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Aws Fis Experiment Prepare

Use when the user wants to prepare, create, or generate an AWS FIS (Fault Injection Service) experiment configuration. Triggers on "prepare FIS experiment",...

Registry SourceRecently Updated
General

Aws Fis Experiment Execute

Use when the user wants to run a prepared AWS FIS experiment where the CloudFormation stack has already been deployed. Triggers on "execute FIS experiment",...

Registry SourceRecently Updated
General

Warranty Return Dispute Kit

Organizes a defective-product, denied-warranty, or return-window dispute into an evidence packet, timeline, support message, escalation script, contact log,...

Registry SourceRecently Updated
General

Goldman Sachs Co

提供高盛公司历史、业务模式、市场地位及关键数据,助力研究投资银行和金融机构角色分析。

Registry SourceRecently Updated