faster-whisper-gpu

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to external services.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "faster-whisper-gpu" with this command: npx skills add felipeoff/faster-whisper-gpu

🎙️ Faster Whisper GPU

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.

✨ Features

  • 🚀 GPU Accelerated: Uses NVIDIA CUDA for blazing-fast transcription
  • 🔒 100% Local: No data leaves your machine. Complete privacy.
  • 💰 Free Forever: No API costs. Run unlimited transcriptions.
  • 🌍 Multilingual: Supports 99 languages with automatic detection
  • 📁 Multiple Formats: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON
  • 🎯 Multiple Models: From tiny (fast) to large-v3 (most accurate)
  • 🎬 Subtitle Generation: Create SRT files with word-level timestamps

📋 Requirements

Hardware

  • NVIDIA GPU with CUDA support (recommended: 4GB+ VRAM)
  • Or CPU-only mode (slower but works on any machine)

Software

  • Python 3.8+
  • NVIDIA drivers (for GPU support)
  • CUDA Toolkit 11.8+ or 12.x

🚀 Quick Start

Installation

# Install dependencies
pip install faster-whisper torch

# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Basic Usage

# Transcribe an audio file (auto-detects GPU)
python transcribe.py audio.mp3

# Specify language explicitly
python transcribe.py audio.mp3 --language pt

# Output as SRT subtitles
python transcribe.py audio.mp3 --format srt --output subtitles.srt

# Use larger model for better accuracy
python transcribe.py audio.mp3 --model large-v3

🔧 Advanced Usage

Command Line Options

python transcribe.py <audio_file> [options]

Options:
  --model {tiny,base,small,medium,large-v1,large-v2,large-v3}
                        Model size to use (default: base)
  --language LANG       Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified.
  --format {txt,srt,json,vtt}
                        Output format (default: txt)
  --output FILE         Output file path (default: stdout)
  --device {cuda,cpu}   Device to use (default: cuda if available)
  --compute_type {int8,int8_float16,int16,float16,float32}
                        Computation precision (default: float16)
  --task {transcribe,translate}
                        Task: transcribe or translate to English (default: transcribe)
  --vad_filter          Enable voice activity detection filter
  --vad_parameters MIN_DURATION_ON,MIN_DURATION_OFF
                        VAD parameters as comma-separated values
  --condition_on_previous_text
                        Condition on previous text (default: True)
  --initial_prompt PROMPT
                        Initial prompt to guide transcription
  --word_timestamps     Include word-level timestamps (for SRT/JSON)
  --hotwords WORDS      Comma-separated hotwords to boost recognition

Examples

Portuguese Transcription with SRT Output

python transcribe.py meeting.mp3 --language pt --format srt --output meeting.srt

English Translation from Any Language

python transcribe.py japanese_audio.mp3 --task translate --format txt

High-Accuracy Mode with Large Model

python transcribe.py podcast.mp3 --model large-v3 --vad_filter --word_timestamps

CPU-Only Mode (no GPU)

python transcribe.py audio.mp3 --device cpu --compute_type int8

🐍 Python API

from faster_whisper import WhisperModel

# Load model
model = WhisperModel("base", device="cuda", compute_type="float16")

# Transcribe
segments, info = model.transcribe("audio.mp3", language="pt")

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

📊 Model Sizes & VRAM Requirements

ModelParametersVRAM RequiredRelative SpeedAccuracy
tiny39 M~1 GB~32xBasic
base74 M~1 GB~16xGood
small244 M~2 GB~6xBetter
medium769 M~5 GB~2xGreat
large-v31550 M~10 GB1xBest

Benchmarks measured on NVIDIA RTX 4090

🔍 Supported Languages

Faster Whisper supports 99 languages including:

  • Portuguese (pt)
  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Japanese (ja)
  • Chinese (zh)
  • Russian (ru)
  • And 90+ more...

🛠️ Troubleshooting

CUDA Out of Memory

# Use smaller model
python transcribe.py audio.mp3 --model tiny

# Or use CPU
python transcribe.py audio.mp3 --device cpu

# Or reduce precision
python transcribe.py audio.mp3 --compute_type int8

Model Download Issues

Models are automatically downloaded on first use to ~/.cache/huggingface/hub/. If behind a proxy, set:

export HF_HOME=/path/to/custom/cache

Slow Transcription

  • Ensure GPU is being used: check nvidia-smi during transcription
  • Use smaller model for faster results
  • Enable VAD filter to skip silent parts

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

📜 License

MIT License - See LICENSE for details.

Faster Whisper is developed by SYSTRAN and based on OpenAI's Whisper.

🙏 Acknowledgments


Made with ❤️ for the OpenClaw community

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Zoom

Zoom API integration with managed OAuth. Manage meetings, webinars, recordings, and user profiles. Use this skill when users want to schedule meetings, manag...

Registry SourceRecently Updated
General

Kleinanzeigen.de Helper

Erstelle und verwalte Verkaufsanzeigen speziell auf kleinanzeigen.de. Verwende diesen Skill wenn der Human sagt, er will etwas auf kleinanzeigen.de verkaufen...

Registry SourceRecently Updated
General

Poku

Sends and receives phone calls and messages (like SMS, WhatsApp, Slack), and reserves dedicated phone numbers using the Poku API. Example use cases: calling...

Registry SourceRecently Updated
General

IMAP/SMTP Email - Maddy Fix

Read and send email via IMAP/SMTP. Check for new/unread messages, fetch content, search mailboxes, mark as read/unread, and send emails with attachments. Sup...

Registry SourceRecently Updated