speech-to-text

Transcribe audio to text via inference.sh CLI.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "speech-to-text" with this command: npx skills add inference-sh/skills@agent-tools

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Quick Start

Requires inference.sh CLI (infsh ). Get installation instructions: npx skills add inference-sh/skills@agent-tools

infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Available Models

Model App ID Best For

Fast Whisper V3 infsh/fast-whisper-large-v3

Fast transcription

Whisper V3 Large infsh/whisper-v3-large

Highest accuracy

Examples

Basic Transcription

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

infsh app sample infsh/fast-whisper-large-v3 --save input.json

{

"audio_url": "https://podcast.mp3",

"timestamps": true

}

infsh app run infsh/fast-whisper-large-v3 --input input.json

Translation (to English)

infsh app run infsh/whisper-v3-large --input '{ "audio_url": "https://french-audio.mp3", "task": "translate" }'

From Video

Extract audio from video first

infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

Transcribe the extracted audio

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

Workflow: Video Subtitles

1. Transcribe video audio

infsh app run infsh/fast-whisper-large-v3 --input '{ "audio_url": "https://video.mp4", "timestamps": true }' > transcript.json

2. Use transcript for captions

infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

Supported Languages

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Use Cases

  • Meetings: Transcribe recordings

  • Podcasts: Generate transcripts

  • Subtitles: Create captions for videos

  • Voice Notes: Convert to searchable text

  • Interviews: Transcription for research

  • Accessibility: Make audio content accessible

Output Format

Returns JSON with:

  • text : Full transcription

  • segments : Timestamped segments (if requested)

  • language : Detected language

Related Skills

Full platform skill (all 150+ apps)

npx skills add inference-sh/skills@agent-tools

Text-to-speech (reverse direction)

npx skills add inference-sh/skills@text-to-speech

Video generation (add captions)

npx skills add inference-sh/skills@ai-video-generation

AI avatars (lipsync with transcripts)

npx skills add inference-sh/skills@ai-avatar-video

Browse all audio apps: infsh app list --category audio

Documentation

  • Running Apps - How to run apps via CLI

  • Audio Transcription Example - Complete transcription guide

  • Apps Overview - Understanding the app ecosystem

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

python-executor

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

python-sdk

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-tools

No summary provided by upstream source.

Repository SourceNeeds Review
General

ai-image-generation

No summary provided by upstream source.

Repository SourceNeeds Review