alicloud-ai-audio-asr

Model Studio Qwen ASR (Non-Realtime)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "alicloud-ai-audio-asr" with this command: npx skills add cinience/alicloud-skills/cinience-alicloud-skills-alicloud-ai-audio-asr

Category: provider

Model Studio Qwen ASR (Non-Realtime)

Validation

mkdir -p output/alicloud-ai-audio-asr python -m py_compile skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr/validate.txt

Pass criteria: command exits 0 and output/alicloud-ai-audio-asr/validate.txt is generated.

Output And Evidence

  • Store transcripts and API responses under output/alicloud-ai-audio-asr/ .

  • Keep one command log or sample response per run.

Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.

Critical model names

Use one of these exact model strings:

  • qwen3-asr-flash

  • qwen-audio-asr

  • qwen3-asr-flash-filetrans

Selection guidance:

  • Use qwen3-asr-flash or qwen-audio-asr for short/normal recordings (sync).

  • Use qwen3-asr-flash-filetrans for long-file transcription (async task workflow).

Prerequisites

  • Install SDK dependencies (script uses Python stdlib only):

python3 -m venv .venv . .venv/bin/activate

  • Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials .

Normalized interface (asr.transcribe)

Request

  • audio (string, required): public URL or local file path.

  • model (string, optional): default qwen3-asr-flash .

  • language_hints (array, optional): e.g. zh , en .

  • sample_rate (number, optional)

  • vocabulary_id (string, optional)

  • disfluency_removal_enabled (bool, optional)

  • timestamp_granularities (array, optional): e.g. sentence .

  • async (bool, optional): default false for sync models, true for qwen3-asr-flash-filetrans .

Response

  • text (string): normalized transcript text.

  • task_id (string, optional): present for async submission.

  • status (string): SUCCEEDED or submission status.

  • raw (object): original API response.

Quick start (official HTTP API)

Sync transcription (OpenAI-compatible protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions'
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
--header 'Content-Type: application/json'
--data '{ "model": "qwen3-asr-flash", "messages": [ { "role": "user", "content": [ { "type": "input_audio", "input_audio": { "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" } } ] } ], "stream": false, "asr_options": { "enable_itn": false } }'

Async long-file transcription (DashScope protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription'
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
--header 'X-DashScope-Async: enable'
--header 'Content-Type: application/json'
--data '{ "model": "qwen3-asr-flash-filetrans", "input": { "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" } }'

Poll task result:

curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>"
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Local helper script

Use the bundled script for URL/local-file input and optional async polling:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
--model qwen3-asr-flash
--language-hints zh,en
--print-response

Long-file mode:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
--model qwen3-asr-flash-filetrans
--async
--wait

Operational guidance

  • For local files, use input_audio.data (data URI) when direct URL is unavailable.

  • Keep language_hints minimal to reduce recognition ambiguity.

  • For async tasks, use 5-20s polling interval with max retry guard.

  • Save normalized outputs under output/alicloud-ai-audio-asr/transcripts/ .

Output location

  • Default output: output/alicloud-ai-audio-asr/transcripts/

  • Override base dir with OUTPUT_DIR .

Workflow

  • Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.

  • Run one minimal read-only query first to verify connectivity and permissions.

  • Execute the target operation with explicit parameters and bounded scope.

  • Verify results and save output/evidence files.

References

  • references/api_reference.md

  • references/sources.md

  • Realtime synthesis is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/ .

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

alicloud-ai-audio-tts-voice-clone

No summary provided by upstream source.

Repository SourceNeeds Review
General

alicloud-ai-image-qwen-image

No summary provided by upstream source.

Repository SourceNeeds Review
General

alicloud-ai-multimodal-qwen-vl

No summary provided by upstream source.

Repository SourceNeeds Review