whisper-stt

Free local speech-to-text transcription using OpenAI Whisper. Transcribe audio files (mp3, wav, m4a, ogg, etc.) to text without API costs. Use when: (1) User needs audio/video transcription, (2) Converting voice memos to text, (3) Generating subtitles (SRT/VTT), (4) Free local STT without cloud API costs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "whisper-stt" with this command: npx skills add nickylin/whisper-stt

Whisper STT Skill

Free, local speech-to-text using OpenAI Whisper.

Prerequisites

Install dependencies (one-time setup):

pip install openai-whisper torch

Optional: Install ffmpeg for broader format support:

  • macOS: brew install ffmpeg
  • Ubuntu: sudo apt install ffmpeg

Usage

Transcribe an audio file

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py <audio_file>

Options

OptionDescription
--modelModel size: tiny, base, small, medium, large, large-v3-turbo (default: base)
--language, -lLanguage code: zh, en, ja, etc. (auto-detect if not specified)
--output, -oOutput format: json, txt, srt, vtt (default: json)

Examples

Chinese audio to text:

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py recording.m4a --language zh --output txt

Generate subtitles (SRT):

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py video.mp4 --output srt > subtitles.srt

Use faster model:

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model tiny --output txt

High accuracy (slower):

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model large-v3 --output txt

Model Selection Guide

ModelSpeedAccuracyVRAM/RAMBest For
tiny~32xBasic~1GBQuick tests, low resource
base~16xGood~1GBBalanced speed/accuracy
small~6xBetter~2GBBetter accuracy
medium~2xVery Good~5GBHigh accuracy
large1xExcellent~10GBBest quality
large-v3-turbo~8xExcellent~6GBFast + accurate (recommended)

Troubleshooting

"ModuleNotFoundError: No module named 'whisper'" → Run: pip install openai-whisper torch

"ffmpeg not found" → Install ffmpeg or convert audio to WAV format first

Slow transcription → Use smaller model (tiny/base) or ensure GPU is available (Apple Silicon MPS, NVIDIA CUDA)

Poor accuracy on Chinese → Use --language zh explicitly and consider larger model (medium/large)

Output Formats

  • json: Full result with segments, timestamps, and metadata
  • txt: Plain text transcription only
  • srt: SubRip subtitle format with timing
  • vtt: WebVTT subtitle format for web players

Credits

Powered by OpenAI Whisper - open source speech recognition.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Fitbit Tracker

Personal Fitbit integration for daily health tracking with adaptive sleep and activity reporting

Registry SourceRecently Updated
General

Ollama Load Balancer

Ollama load balancer for Llama, Qwen, DeepSeek, and Mistral inference across multiple machines. Load balancing with auto-discovery via mDNS, health checks, q...

Registry SourceRecently Updated
General

Google Merchant Center

Google Merchant Center integration. Manage Accounts. Use when the user wants to interact with Google Merchant Center data.

Registry SourceRecently Updated
General

Twitter/X All-in-One — Search, Monitor & Publish Text & Media Posts

Searches and reads X (Twitter): profiles, timelines, mentions, followers, tweet search, trends, lists, communities, and Spaces. Publishes posts, likes/unlike...

Registry SourceRecently Updated