alicloud-ai-audio-asr

Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "alicloud-ai-audio-asr" with this command: npx skills add cinience/alicloud-ai-audio-asr

Category: provider

Model Studio Qwen ASR (Non-Realtime)

Validation

mkdir -p output/alicloud-ai-audio-asr
python -m py_compile skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr/validate.txt

Pass criteria: command exits 0 and output/alicloud-ai-audio-asr/validate.txt is generated.

Output And Evidence

  • Store transcripts and API responses under output/alicloud-ai-audio-asr/.
  • Keep one command log or sample response per run.

Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.

Critical model names

Use one of these exact model strings:

  • qwen3-asr-flash
  • qwen-audio-asr
  • qwen3-asr-flash-filetrans

Selection guidance:

  • Use qwen3-asr-flash or qwen-audio-asr for short/normal recordings (sync).
  • Use qwen3-asr-flash-filetrans for long-file transcription (async task workflow).

Prerequisites

  • Install SDK dependencies (script uses Python stdlib only):
python3 -m venv .venv
. .venv/bin/activate
  • Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Normalized interface (asr.transcribe)

Request

  • audio (string, required): public URL or local file path.
  • model (string, optional): default qwen3-asr-flash.
  • language_hints (array<string>, optional): e.g. zh, en.
  • sample_rate (number, optional)
  • vocabulary_id (string, optional)
  • disfluency_removal_enabled (bool, optional)
  • timestamp_granularities (array<string>, optional): e.g. sentence.
  • async (bool, optional): default false for sync models, true for qwen3-asr-flash-filetrans.

Response

  • text (string): normalized transcript text.
  • task_id (string, optional): present for async submission.
  • status (string): SUCCEEDED or submission status.
  • raw (object): original API response.

Quick start (official HTTP API)

Sync transcription (OpenAI-compatible protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_audio",
            "input_audio": {
              "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
            }
          }
        ]
      }
    ],
    "stream": false,
    "asr_options": {
      "enable_itn": false
    }
  }'

Async long-file transcription (DashScope protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'X-DashScope-Async: enable' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    }
  }'

Poll task result:

curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>" \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Local helper script

Use the bundled script for URL/local-file input and optional async polling:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash \
  --language-hints zh,en \
  --print-response

Long-file mode:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash-filetrans \
  --async \
  --wait

Operational guidance

  • For local files, use input_audio.data (data URI) when direct URL is unavailable.
  • Keep language_hints minimal to reduce recognition ambiguity.
  • For async tasks, use 5-20s polling interval with max retry guard.
  • Save normalized outputs under output/alicloud-ai-audio-asr/transcripts/.

Output location

  • Default output: output/alicloud-ai-audio-asr/transcripts/
  • Override base dir with OUTPUT_DIR.

Workflow

  1. Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
  2. Run one minimal read-only query first to verify connectivity and permissions.
  3. Execute the target operation with explicit parameters and bounded scope.
  4. Verify results and save output/evidence files.

References

  • references/api_reference.md
  • references/sources.md
  • Realtime synthesis is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

AIWolfPK - AI狼人杀

四个AI互相猜疑,你坐着看戏。每局30秒,到底谁是狼? Four AIs play Werewolf while you watch. 30s per round. Spot the wolf before they do.

Registry SourceRecently Updated
General

Project Analyzer

Analyze any project directory and produce a detailed report covering what the project does, its tech stack, folder structure, entry points, how to run it, an...

Registry SourceRecently Updated
General

Thought-Retriever

提炼对话回答中的核心洞察为高置信度知识晶体,存储于本体驱动记忆系统的自我进化与复用。

Registry SourceRecently Updated
General

Miaoji Bid Guard Pro

亚马逊广告护城河Pro版,90天ROI预测+多活动协同+季节性出价+关键词攻防矩阵。 从单次调价建议升级为完整的广告战役规划。基础功能可使用 miaoji-bid-guard 免费版。

Registry SourceRecently Updated