xiaoyuzhou-asr

Transcribe 小宇宙 (Xiaoyuzhou) podcast episodes to text using local Qwen3-ASR speech recognition. Combines xyz API (小宇宙FM API) to fetch episode metadata and audio URLs with qwen3-asr-rs for local GPU-accelerated speech-to-text. Use when user wants to: (1) transcribe a 小宇宙 podcast episode, (2) download podcast audio from 小宇宙, (3) search and transcribe podcast content, (4) batch transcribe multiple episodes. Triggers: 小宇宙, podcast transcription, 播客转文字, ASR, speech recognition, episode transcript, 节目转录, qwen3-asr, 小宇宙转录.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "xiaoyuzhou-asr" with this command: npx skills add worldwonderer/xiaoyuzhou-asr

xiaoyuzhou-asr

Transcribe 小宇宙 podcast episodes to text using local Qwen3-ASR (Metal/CUDA accelerated).

Prerequisites

  1. xyz API server running — fetches episode data and audio URLs from 小宇宙
    git clone https://github.com/ultrazg/xyz.git && cd xyz && go run .
    # Default port: 23020, change with -p
    
  2. Access token — login via POST /sendCode then POST /login (see references/xyz-api.md)
  3. ffmpeg — audio format conversion (brew install ffmpeg)
  4. Qwen3-ASR model — download (HF Hub does NOT ship tokenizer.json):
    python3 -c "
    from huggingface_hub import snapshot_download
    snapshot_download('Qwen/Qwen3-ASR-0.6B', local_dir='models/0.6B')
    "
    
  5. qwen3-asr-rs — build from source:
    git clone https://github.com/alan890104/qwen3-asr-rs.git && cd qwen3-asr-rs
    cargo build --release --example local_transcribe
    
  6. tokenizer.json — auto-generated by the transcription script on first run (from vocab.json + merges.txt). No manual step needed.

Workflow

Step 1: Find Episode

TOKEN="$XYZ_ACCESS_TOKEN"
BASE="http://localhost:23020"

# Search episodes by keyword
curl -s -X POST $BASE/search \
  -H "x-jike-access-token: $TOKEN" -H "Content-Type: application/json" \
  -d '{"keyword":"关键词","type":"EPISODE"}'

# Get episode detail (contains audio URL)
curl -s -X POST $BASE/episode_detail \
  -H "x-jike-access-token: $TOKEN" -H "Content-Type: application/json" \
  -d '{"eid":"EPISODE_ID"}'

# List episodes of a podcast
curl -s -X POST $BASE/episode_list \
  -H "x-jike-access-token: $TOKEN" -H "Content-Type: application/json" \
  -d '{"pid":"PODCAST_ID","order":"desc"}'

Step 2: Download and Convert Audio

Audio URL is in data.data.media.source.url (m4a format).

mkdir -p /tmp/xiaoyuzhou-audio
curl -L -o /tmp/xiaoyuzhou-audio/episode.m4a "$AUDIO_URL"
ffmpeg -y -i /tmp/xiaoyuzhou-audio/episode.m4a -ar 16000 -ac 1 /tmp/xiaoyuzhou-audio/episode.wav

Step 3: Split Long Audio (REQUIRED for >3 min)

Podcasts are continuous speech with few silence gaps. Use fixed-interval splitting:

# Split into 3-minute segments (must split at ≥2 min for Metal GPU memory)
ffmpeg -y -i episode.wav -f segment -segment_time 180 -ar 16000 -ac 1 seg_%03d.wav

Or try silence-based splitting (may find no gaps in continuous podcasts):

ffmpeg -i episode.wav -af "silencedetect=noise=-30dB:d=2" -f null - 2>&1 | grep silence_end
ffmpeg -i episode.wav -f segment -segment_times T1,T2 -ar 16000 -ac 1 seg_%03d.wav

Step 4: Transcribe

MODEL_DIR="/path/to/models/0.6B"
ASR_BIN="qwen3-asr-rs/target/release/examples/local_transcribe"

# Transcribe each segment
for seg in seg_*.wav; do
  $ASR_BIN $MODEL_DIR $seg 2>/dev/null | grep "^Text     :" | sed 's/^Text     : //'
done

For efficiency (load model once in Rust):

use qwen3_asr::{AsrInference, TranscribeOptions, best_device};
let engine = AsrInference::load("models/0.6B", best_device())?;
for seg in segments {
    let result = engine.transcribe(&seg, TranscribeOptions::default())?;
    output.push(result.text);
}

Step 5: Format Output

Combine transcript with metadata as markdown:

# {title}
**节目**: {podcast.title} | **日期**: {pubDate} | **时长**: {duration}s

## 转录文本
{transcript}

References

Token Management

  • Tokens expire. If API returns 401, refresh: POST /refresh_token
  • Store in env: XYZ_ACCESS_TOKEN, XYZ_REFRESH_TOKEN
  • Prompt user to login if no valid token

Constraints

  • MUST split audio into ≤3-minute segments for Metal GPU stability
  • Audio must be WAV 16kHz mono
  • tokenizer.json must be generated manually (not included in HF download)
  • local_transcribe binary needed (demo binary only runs built-in test samples)
  • xyz API requires Chinese phone number (+86) login
  • All processing is local — audio never leaves the machine

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

声音制作规范,Jiuge_Flow_Perfect_V1.skill

九歌传媒机器人的基础行为规范,约束文件发送、定时任务、文档生成三大核心行为。当需要发送文件时、安排定时任务时、生成文档时,必须遵循本规范。本规范优先级高于其他技能的具体指令。

Registry SourceRecently Updated
General

Report Expert

生成 HTML 报告页面并部署到 Cloudflare Pages 站点。涵盖设计系统、页面结构、索引管理、iframe 内嵌查看、自动部署全流程。触发词:写报告、发布报告、部署报告、生成报告页面、report publisher、报告专家、升级报告专家、更新报告技能、发布技能升级。

Registry SourceRecently Updated
General

Nexlink

🔗 NexLink — Enterprise Connector for Nextcloud, Exchange & YouTube. Built by Firma de AI. Email, calendar, tasks, file management, document understanding, t...

Registry SourceRecently Updated
General

Prompt Wizard

Generate high-quality English prompts for ChatGPT Image 2. Use when user wants to create AI image prompts, needs GPT-Image-2 prompt writing help, describes a...

Registry SourceRecently Updated