parakeet-stt

Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "parakeet-stt" with this command: npx skills add carlulsoe/parakeet-stt

Parakeet TDT (Speech-to-Text)

Local transcription using NVIDIA Parakeet TDT 0.6B v3 with ONNX Runtime. Runs on CPU — no GPU required. ~30x faster than realtime.

Installation

# Clone the repo
git clone https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai.git
cd parakeet-tdt-0.6b-v3-fastapi-openai

# Run with Docker (recommended)
docker compose up -d parakeet-cpu

# Or run directly with Python
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 5000

Default port is 5000. Set PARAKEET_URL to override (e.g., http://localhost:5092).

API Endpoint

OpenAI-compatible API at $PARAKEET_URL (default: http://localhost:5000).

Quick Start

# Transcribe audio file (plain text)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=text"

# Get timestamps and segments
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=verbose_json"

# Generate subtitles (SRT)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
  -F "file=@/path/to/audio.mp3" \
  -F "response_format=srt"

Python / OpenAI SDK

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("PARAKEET_URL", "http://localhost:5000") + "/v1",
    api_key="not-needed"
)

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="parakeet-tdt-0.6b-v3",
        file=f,
        response_format="text"
    )
print(transcript)

Response Formats

FormatOutput
textPlain text
json{"text": "..."}
verbose_jsonSegments with timestamps and words
srtSRT subtitles
vttWebVTT subtitles

Supported Languages (25)

English, Spanish, French, German, Italian, Portuguese, Polish, Russian, Ukrainian, Dutch, Swedish, Danish, Finnish, Norwegian, Greek, Czech, Romanian, Hungarian, Bulgarian, Slovak, Croatian, Lithuanian, Latvian, Estonian, Slovenian

Language is auto-detected — no configuration needed.

Web Interface

Open $PARAKEET_URL in a browser for drag-and-drop transcription UI.

Docker Management

# Check status
docker ps --filter "name=parakeet"

# View logs
docker logs -f <container-name>

# Restart
docker compose restart

# Stop
docker compose down

Why Parakeet over Whisper?

  • Speed: ~30x faster than realtime on CPU
  • Accuracy: Comparable to Whisper large-v3
  • Privacy: Runs 100% locally, no cloud calls
  • Compatibility: Drop-in replacement for OpenAI's transcription API

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Omniscient

全知全能技能 — 整合认知套件、执行框架、系统控制三大能力层,并配备编排引擎。 认知层:四种思维操作码(直用/改进/迁移/构建)覆盖所有思考任务; 执行层:大语言模型 + 命令执行工具,自动化代码生成与脚本执行; 操控层:Windows桌面软件、系统硬件、串口设备、物联网平台、图形界面自动化、蓝牙设备、GPU显卡...

Registry SourceRecently Updated
General

系统清理技能

定期清理OpenClaw系统中的备份、临时及会话文件,分析磁盘使用并检查系统状态,优化系统性能。

Registry SourceRecently Updated
General

Whoislookup

Look up domain WHOIS registration info — registrar, creation date, expiry date, nameservers, and domain status. Use when asked to check who owns a domain, wh...

Registry SourceRecently Updated
General

WayinVideo - Find Moments in the Video

Find specific moments in a video using a natural language query. Ideal for locating particular scenes, topics, or events in long videos (e.g., “the part wher...

Registry SourceRecently Updated