salute-speech

Transcribe audio files using Sber Salute Speech async API. Russian-first STT with support for ru-RU, en-US, kk-KZ, ky-KG, uz-UZ.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "salute-speech" with this command: npx skills add chorus12/salute-speech

Audio Transcription with Sber Salute Speech

Transcribe audio/video files to text with timestamps via Salute Speech async REST API.

Requirements

  • API Key: Environment variable SALUTE_AUTH_DATA must be set (Base64-encoded client_id:client_secret or raw authorization key from https://developers.sber.ru/studio/).
  • SSL note: The script disables SSL verification by default (verify_ssl=False) because Sber's certificate chain is non-standard. This is expected.

Supported formats & encodings

Audio encodingContent-TypeTypical extensions
MP3audio/mpeg.mp3
PCM_S16LEaudio/wav.wav
OPUSaudio/ogg.ogg, .opus
FLACaudio/flac.flac
ALAWaudio/alaw.alaw
MULAWaudio/mulaw.mulaw

Supported languages

ru-RU, en-US, kk-KZ (Kazakh), ky-KG (Kyrgyz), uz-UZ (Uzbek).

Workflow

  1. Identify input files — from user request.
  2. Read API key from host environment.
  3. Run transcription — execute salute_transcribe.py with uv and appropriate arguments.
  4. Deliver results — present to user human-readable transcript with timestamps to the user and give a direct link to files.

Usage

uv run --with requests {baseDir}/salute_transcribe.py \
  --file /path/to/audio.mp3 \
  --output_dir ~/.openclaw/workspace/transcriptions \
  --lang ru-RU

Arguments

ArgumentRequiredDefaultDescription
--fileYesPath to audio/video file
--output_dirNo~/.openclaw/workspace/transcribationsOutput directory for results
--langNoru-RULanguage code: ru-RU, en-US, kk-KZ, ky-KG, uz-UZ
--audio-encodingNoMP3Codec: MP3, PCM_S16LE, OPUS, FLAC, ALAW, MULAW
--modelNogeneralRecognition model: general or callcenter
--hyp-countNo1Number of alternative hypotheses: 1 or 2
--max-wait-timeNo300Max seconds to wait for async result
--printNooffAlso print transcription to stdout

Content-Type mapping

When the file extension doesn't match audio/mpeg, adjust content_type in the script or add logic. Current default is audio/mpeg (MP3). For .wav files use audio/wav, etc.

Output files

For input file meetingABC.mp3 the script produces:

FileDescription
meetingABC_recognition_orig.jsonRaw API response (full JSON with all hypotheses, timing, confidence)
meetingABC_pretty.txtFormatted human-readable transcript with timestamps

Output text format

[00:01 - 00:20]:
Ну, даже если сосредоточиться на идее узкой щели.

[00:20 - 00:45]:
Следующий фрагмент текста здесь.

Notes

  • Token is valid for ~30 minutes; the script fetches a new one each run.
  • Large files (>1 hour) may need --max-wait-time increased beyond 300s.
  • The callcenter model is optimized for telephony audio (8kHz, mono).
  • Profanity filter is disabled by default (enable_profanity_filter=False).
  • The script uses normalized text by default (numbers as digits, abbreviations expanded). Raw text is also available in the JSON output.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Ai Competitor Analyzer

提供AI驱动的竞争对手分析,支持批量自动处理,提升企业和专业团队分析效率与专业度。

Registry SourceRecently Updated
General

Ai Data Visualization

提供自动化AI分析与多格式批量处理,显著提升数据可视化效率,节省成本,适用企业和个人用户。

Registry SourceRecently Updated
General

Ai Cost Optimizer

提供基于预算和任务需求的AI模型成本优化方案,计算节省并指导OpenClaw配置与模型切换策略。

Registry SourceRecently Updated