Agent Right Brain

Use rawgenai <provider> <action> to give agents creative abilities. Always read the chosen provider's reference file before running commands.

Prerequisites

brew install WHQ25/tap/rawgenai

Before using a provider, read its setup guide at references/setup/ to configure credentials.

Input Sources (All Capabilities)

Positional argument: rawgenai <provider> <action> "text" [flags]
File: rawgenai <provider> <action> --file input.txt [flags]
Stdin: echo "text" | rawgenai <provider> <action> [flags]

General Guidelines

On first use of a capability, ask user to pick a provider. Remember for the session.
All output is JSON. Always show file paths to the user.
For async commands (video, some image/audio): create -> status -> download.
If a command fails, try a different provider or inform the user.
Write image/video prompts descriptively: subject + action + environment + style + lighting.
For TTS: write natural conversational text, not markdown. Use --speak for playback, -o for file.

Speak (TTS)

rawgenai <provider> tts "<text>" --speak

Provider	Command	Best For	Reference
OpenAI	`rawgenai openai tts`	General purpose, English	ref
Google Gemini	`rawgenai google tts`	Expressive storytelling, multi-speaker	ref
ElevenLabs	`rawgenai elevenlabs tts`	Most natural voices, 70+ languages	ref
Seed	`rawgenai seed tts`	Chinese, emotion-rich	ref
DashScope	`rawgenai dashscope tts`	Chinese, 10 languages, 49 voices	ref
MiniMax	`rawgenai minimax tts`	Chinese, streaming	ref
Kling	`rawgenai kling tts`	Bilingual zh/en	ref
Runway	`rawgenai runway audio tts`	Async	—

Listen (STT)

rawgenai <provider> stt <audio-file>

Provider	Command	Best For	Reference
OpenAI	`rawgenai openai stt`	Subtitles (srt/vtt)	ref
Google Gemini	`rawgenai google stt`	Speaker diarization	ref
ElevenLabs	`rawgenai elevenlabs stt`	Large files (3GB), video input	ref
DashScope	`rawgenai dashscope stt`	Chinese, emotion, long audio (12h async)	ref

Image

rawgenai <provider> image "<prompt>" -o output.png

Provider	Command	Best For	Reference
OpenAI	`rawgenai openai image`	Transparent bg, editing, multi-turn	ref
Google Gemini	`rawgenai google image`	4K, text in image	ref
Grok	`rawgenai grok image`	Batch (up to 10)	ref
Seed	`rawgenai seed image`	4K, multi-image fusion	ref
DashScope	`rawgenai dashscope image`	Text rendering, Chinese	ref
MiniMax	`rawgenai minimax image`	Subject reference	ref
Kling	`rawgenai kling image`	Face reference (async)	ref
Luma	`rawgenai luma image`	Creative, reframe (async)	—
Hunyuan	`rawgenai hunyuan image`	Chinese (async)	—
Runway	`rawgenai runway image`	Cinematic (async)	—

Video

rawgenai <provider> video create "<prompt>" [flags] → status <id> → download <id> -o out.mp4

Provider	Command	Best For	Reference
OpenAI (Sora)	`rawgenai openai video`	Remix	ref
Google (Veo)	`rawgenai google video`	4K, extension	ref
Grok	`rawgenai grok video`	Quick, editing	ref
Seed	`rawgenai seed video`	Audio, wide ratios	ref
DashScope	`rawgenai dashscope video`	Character ref, multi-shot	ref
MiniMax (Hailuo)	`rawgenai minimax video`	Subject ref, director modes	ref
Kling	`rawgenai kling video`	Most advanced, element system	ref
Luma	`rawgenai luma video`	Extension, upscale	—
Hunyuan	`rawgenai hunyuan video`	Chinese	—
Runway	`rawgenai runway video`	Cinematic, character ref	—

Music

Provider	Command	Best For	Reference
ElevenLabs	`rawgenai elevenlabs music`	Prompt-based, composition plans	ref
MiniMax	`rawgenai minimax music create`	Lyrics-to-music, Chinese	ref

Sound Effects (SFX)

Provider	Command	Reference
ElevenLabs	`rawgenai elevenlabs sfx "<prompt>" -o out.mp3`	ref
Runway	`rawgenai runway audio sfx "<prompt>"`	—

Dialogue

Multi-speaker dialogue from JSON script (max 10 voices).

Provider	Command	Reference
ElevenLabs	`rawgenai elevenlabs dialogue -i script.json -o out.mp3`	ref

Voice Management

Design, clone, and manage custom voices.

Provider	Command	Capabilities	Reference
ElevenLabs	`rawgenai elevenlabs voice`	list, design, create, preview	ref
Kling	`rawgenai kling voice`	create, status, list, delete	ref
MiniMax	`rawgenai minimax voice`	list, upload, clone, design, delete	ref
Seed	`rawgenai seed voice-clone`	upload, status, order, renew	ref

Audio Processing

Async: rawgenai runway audio <action> → status <id> → download <id> -o out

Provider	Command	Capability
Runway	`rawgenai runway audio sts`	Speech-to-speech (voice conversion)
Runway	`rawgenai runway audio dubbing`	Dub audio to another language
Runway	`rawgenai runway audio isolation`	Isolate voice from background