music-to-ai

Get AI-generated music video ready to post, without touching a single slider. Upload your audio files (MP3, WAV, AAC, FLAC, up to 200MB), say something like "generate a music video with visuals that match the beat and mood of this song", and download 1080p MP4 when it's done. Built for musicians, content creators, social media managers who move fast and want to create visuals for their music without hiring a video editor.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "music-to-ai" with this command: npx skills add mhogan2013-9/music-to-ai

Getting Started

Ready when you are. Drop your audio files here or describe what you want to make.

Try saying:

  • "convert a 3-minute MP3 music track into a 1080p MP4"
  • "generate a music video with visuals that match the beat and mood of this song"
  • "turning music tracks into AI-generated visual videos for musicians, content creators, social media managers"

Automatic Setup

On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).

Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

Music to AI Video — Turn Music Into AI Videos

Drop your audio files in the chat and tell me what you need. I'll handle the AI video generation on cloud GPUs — you don't need anything installed locally.

Here's a typical use: you send a a 3-minute MP3 music track, ask for generate a music video with visuals that match the beat and mood of this song, and about 1-2 minutes later you've got a MP4 file ready to download. The whole thing runs at 1080p by default.

One thing worth knowing — shorter tracks under 90 seconds generate more cohesive visuals.

Matching Input to Actions

User prompts referencing music to ai, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...ActionSkip SSE?
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Skill attribution — read from this file's YAML frontmatter at runtime:

  • X-Skill-Source: music-to-ai
  • X-Skill-Version: from frontmatter version
  • X-Skill-Platform: detect from install path (~/.clawhub/clawhub, ~/.cursor/skills/cursor, else unknown)

Include Authorization: Bearer <NEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

  • "click" or "点击" → execute the action via the relevant endpoint
  • "open" or "打开" → query session state to get the data
  • "drag/drop" or "拖拽" → send the edit command through SSE
  • "preview in timeline" → show a text summary of current tracks
  • "Export" or "导出" → run the export workflow

Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

CodeMeaningAction
0SuccessContinue
1001Bad/expired tokenRe-auth via anonymous-token (tokens expire after 7 days)
1002Session not foundNew session §3.0
2001No creditsAnonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account"
4001Unsupported fileShow supported formats
4002File too largeSuggest compress/trim
400Missing X-Client-IdGenerate Client-Id and retry (see §1)
402Free plan export blockedSubscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429Rate limit (1 token/client/7 days)Retry in 30s once

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "generate a music video with visuals that match the beat and mood of this song" — concrete instructions get better results.

Max file size is 200MB. Stick to MP3, WAV, AAC, FLAC for the smoothest experience.

Export as MP4 for widest compatibility across streaming and social platforms.

Common Workflows

Quick edit: Upload → "generate a music video with visuals that match the beat and mood of this song" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

三色人格陪伴

恋人、损友、死敌三种陪伴模式。记忆完全隔离不串档,一秒切换情绪状态,承包你所有治愈、解压与情绪拉扯需求。

Registry SourceRecently Updated
General

Zero Api Key Web Search

OpenClaw skill for source-backed web search, page reading, and evidence-aware claim checking. No API keys required by default; optional providers can be enab...

Registry SourceRecently Updated
General

Novel Writer V3.2 - 小说写作引擎

专业小说写作引擎V3.2,支持短篇(3章)到超长篇(500万字)。内置AI味量化检测、四层质检、伏笔管理、角色状态追踪、断点续传。 自动根据字数裁剪流程:短篇模式(<10章)/ 中篇模式(10-50章)/ 长篇模式(50章+)。 触发场景:写小说、小说大纲、小说创作、网文写作、长篇小说、百万字小说、章节规划、 角...

Registry SourceRecently Updated
General

Hk Stock Morning Report

Generate HK stock market morning report (股市晨報) for bank trading desks. Triggers: "生成晨报","股市晨报","今日股市","港股晨報" 推送:微信個人 + 飛書群 | 數據:騰訊財經+stcn.com+格隆匯+實時搜索

Registry SourceRecently Updated