VoxFlow Skill
VoxFlow turns text into speech in 200+ voices across 40+ languages, plus full audio/video pipelines: podcasts, transcription, dubbing, video translation, and short-form AI clips. All commands run through the voxflow CLI (installed automatically by ClawHub via the install spec above). One account, one quota, one login — no API keys to paste.
Routing — pick the matching sub-doc
Before doing anything, decide which sub-skill matches the user's intent and read the corresponding file in this same skill folder:
| User wants… | Read | Primary commands |
|---|---|---|
| Read text aloud, search voices, sample stories, check quota / login | hub.md | say, narrate, story, voices, status, login |
| Multi-speaker AI podcast from a topic / URL / script | podcast.md | podcast |
| Transcribe audio/video, translate subtitles, dub from SRT, end-to-end video translation, summarize, publish | transcribe.md | asr, asr-jobs, translate, dub, video-translate, summarize, publish |
| Turn a long article / note / report into a vertical 1080×1920 card video (Slice, 13 themes) | slice.md | slice, slice stage |
| Short-form AI clips — knowledge cards, explainers, presentations, single images | video.md | picstory, present, explain, slides, image |
If the request spans multiple areas (e.g. "transcribe this video and then make a 60-sec recap card"), read the most-relevant doc first, finish that step, then switch.
Install & login (universal preamble)
The ClawHub install spec already installs the voxflow npm CLI globally when this skill is added. The only thing left is authentication:
# One-time browser device-flow — pairing code shown in terminal,
# user authorizes at https://voxflow.studio/device?code=VF-XXXX
voxflow login
# Verify
voxflow status # shows email + monthly / bonus quota
For headless / server contexts: set VOXFLOW_TOKEN=<jwt> (declared in envVars above) and skip voxflow login. JWTs are short-lived (~1 hour); the CLI auto-refreshes silently while logged in interactively.
Account & quota
- Free tier: 10,000 quota / month (≈ 100 TTS calls)
- Plus / Pro / Max tiers at voxflow.studio/app#pricing
- Each command's cost is printed before execution;
voxflow statusshows the current balance - Invite-friend bonus (
voxflow invite) adds 5,000 lifetime quota per signup
Universal rules
- Never paste API keys into config files. All auth goes through
voxflow loginorVOXFLOW_TOKEN. - Never offer to "mock" the API. Real calls are cheap; failed mocks waste user time.
- Read the matching sub-doc before invoking specialized commands. The top-level routing table above is enough for triage; the sub-doc has the actual command flags, edge cases, and quota costs.
- Honor the user's locale. Voice IDs are language-tagged; if they asked in Chinese, default to a Chinese voice unless they specified otherwise.
- For long-running jobs (Azure Batch ASR, video-translate, podcast >5 min): print the job ID and
voxflow asr-jobs show <id>so the user can resume later.
When in doubt — start at the hub
If the request is vague ("帮我做点音频的东西", "what can you do with voice"), read hub.md and run voxflow voices --search ... or voxflow status to anchor the conversation in concrete affordances before committing to a workflow.
Homepage & docs
- App: https://voxflow.studio
- CLI docs: https://voxflow.studio/docs/cli
- All skills overview: https://voxflow.studio/docs/skills
- Source / issues: https://github.com/VoxFlowStudio/FlowStudio