Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Quick Start

Requires inference.sh CLI (belt ). Install instructions

belt login

belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Available Models

Model App ID Best For

ElevenLabs Scribe v2 elevenlabs/stt

98%+ accuracy, diarization, 90+ languages

Fast Whisper V3 infsh/fast-whisper-large-v3

Fast transcription

Whisper V3 Large infsh/whisper-v3-large

Highest accuracy

Examples

Basic Transcription

belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

belt app sample infsh/fast-whisper-large-v3 --save input.json

{

"audio_url": "https://podcast.mp3",

"timestamps": true

}

belt app run infsh/fast-whisper-large-v3 --input input.json

Translation (to English)

belt app run infsh/whisper-v3-large --input '{ "audio_url": "https://french-audio.mp3", "task": "translate" }'

From Video

Extract audio from video first

belt app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

Transcribe the extracted audio

belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

Workflow: Video Subtitles

1. Transcribe video audio

belt app run infsh/fast-whisper-large-v3 --input '{ "audio_url": "https://video.mp4", "timestamps": true }' > transcript.json

2. Use transcript for captions

belt app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

Supported Languages

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Use Cases

Meetings: Transcribe recordings
Podcasts: Generate transcripts
Subtitles: Create captions for videos
Voice Notes: Convert to searchable text
Interviews: Transcription for research
Accessibility: Make audio content accessible

Output Format

Returns JSON with:

text : Full transcription
segments : Timestamped segments (if requested)
language : Detected language

Related Skills

ElevenLabs STT (98%+ accuracy, diarization)

npx skills add inference-sh/skills@elevenlabs-stt

ElevenLabs TTS (reverse direction)

npx skills add inference-sh/skills@elevenlabs-tts

Full platform skill (all 250+ apps)

npx skills add inference-sh/skills@infsh-cli

Text-to-speech (reverse direction)

npx skills add inference-sh/skills@text-to-speech

Video generation (add captions)

npx skills add inference-sh/skills@ai-video-generation

AI avatars (lipsync with transcripts)

npx skills add inference-sh/skills@ai-avatar-video

Browse all audio apps: belt app list --category audio

Documentation

Running Apps - How to run apps via CLI
Audio Transcription Example - Complete transcription guide
Apps Overview - Understanding the app ecosystem

speech-to-text

Safety Notice

Copy this and send it to your AI assistant to learn

{