oatda-transcribe-audio

Transcribe audio to text using OATDA's unified audio API. Triggers when the user wants speech-to-text, transcription of meetings, podcasts, voice notes, subtitles, timestamps, or Whisper-style transcription through OATDA.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "oatda-transcribe-audio" with this command: npx skills add devcsde/oatda-transcribe-audio

OATDA Audio Transcription

Transcribe audio files to text through OATDA's unified audio API.

API Key Resolution

All commands need the OATDA API key. Resolve it inline for each exec call:

export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}"

If the key is empty or null, tell the user to get one at https://oatda.com and configure it.

Security: Never print the full API key. Only verify existence or show first 8 chars.

Model Mapping

User saysProviderModel
whisper, whisper-1, openai whisper (default)openaiwhisper-1
transcription, speech to text, sttopenaiwhisper-1

Default: openai / whisper-1 if no model specified.

If the user provides provider/model format directly (for example openai/whisper-1), split on /.

⚠️ Models change over time. If a model ID fails, query oatda-list-models with ?type=audio first.

Input Preparation

The transcription endpoint supports:

  • multipart/form-data with a local file upload
  • JSON with a base64 data URL in file
  • JSON with file_base64 for providers that support direct base64 payloads

Maximum audio file size is 25MB.

For local files, prefer multipart upload because it is simpler and avoids large JSON bodies.

Discovering Audio Model Parameters

export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X GET "https://oatda.com/api/v1/llm/models?type=audio" \
  -H "Authorization: Bearer $OATDA_API_KEY" | jq '.audio_models[] | {id, supported_params}'

Look for:

  • audio_modes containing transcription
  • supported response_format values
  • optional timestamp, diarization, or streaming support

API Call (multipart)

export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
  -H "Authorization: Bearer $OATDA_API_KEY" \
  -F "provider=<PROVIDER>" \
  -F "model=<MODEL>" \
  -F "file=@<AUDIO_FILE>" \
  -F "response_format=json"

Alternative API Call (base64 JSON)

AUDIO_DATA_URL="data:audio/mpeg;base64,$(base64 -w 0 audio.mp3)"

export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OATDA_API_KEY" \
  -d "$(jq -n \
    --arg provider \"<PROVIDER>\" \
    --arg model \"<MODEL>\" \
    --arg file \"$AUDIO_DATA_URL\" \
    '{provider: $provider, model: $model, file: $file, response_format: \"json\"}')"

Common Parameters

  • language: ISO-639-1 language code like en, de, fr
  • prompt: Context for names, acronyms, or domain-specific terms
  • response_format: json, text, srt, verbose_json, vtt, or diarized_json
  • temperature: 0 to 1
  • timestamp_granularities: word and/or segment
  • chunking_strategy: auto
  • hotwords: Provider-specific keyword hints
  • stream: true if supported by the selected model

Response Format

The API returns JSON like:

{
  "text": "The transcribed text...",
  "language": "en",
  "duration": 42.5,
  "segments": [],
  "words": [],
  "costs": {
    "inputCost": 0,
    "outputCost": 0.0001,
    "totalCost": 0.0001,
    "currency": "USD"
  }
}

Present the text field to the user. Include subtitles, segments, or words if the requested format includes them.

Error Handling

HTTP StatusMeaningAction
401Invalid API keyTell user to check their key
402Insufficient creditsTell user to check balance
400Bad request / model not supportedCheck model or file format and query oatda-list-models with type=audio
413File too largeKeep audio under 25MB or split it
429Rate limited or monthly capWait briefly and retry once

Example

export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
  -H "Authorization: Bearer $OATDA_API_KEY" \
  -F "provider=openai" \
  -F "model=whisper-1" \
  -F "file=@meeting.mp3" \
  -F "response_format=json"

Notes

  • Endpoint: /api/v1/llm/transcriptions
  • Prefer multipart upload for local files
  • Use response_format=srt or vtt for subtitles
  • Use language to improve recognition when source language is known
  • Equivalent capability name: transcribe_audio
  • Related skills: oatda-generate-speech, oatda-translate-audio, oatda-list-models

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

通义晓蜜 - 智能外呼

触发阿里云晓蜜外呼机器人任务,自动批量拨打电话。适用于批量外呼、客户回访、满意度调查、简历筛查约面试等场景。可从前置工具或节点获取外呼名单。

Registry SourceRecently Updated
General

Letterboxd Watchlist

Scrape a public Letterboxd user's watchlist into a CSV/JSONL list of titles and film URLs without logging in. Use when a user asks to export, scrape, or mirror a Letterboxd watchlist, or to build watch-next queues.

Registry SourceRecently Updated
General

Seedance Video Generation

Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.

Registry SourceRecently Updated
4.2K17jackycser
General

Universal Skills Manager

The master coordinator for AI skills. Discovers skills from multiple sources (SkillsMP.com, SkillHub, and ClawHub), manages installation, and synchronization...

Registry SourceRecently Updated