Agent Right Brain
Use rawgenai <provider> <action> to give agents creative abilities. Always read the chosen provider's reference file before running commands.
Prerequisites
brew install WHQ25/tap/rawgenai
Before using a provider, read its setup guide at references/setup/ to configure credentials.
Input Sources (All Capabilities)
- Positional argument:
rawgenai <provider> <action> "text" [flags] - File:
rawgenai <provider> <action> --file input.txt [flags] - Stdin:
echo "text" | rawgenai <provider> <action> [flags]
General Guidelines
- On first use of a capability, ask user to pick a provider. Remember for the session.
- All output is JSON. Always show file paths to the user.
- For async commands (video, some image/audio):
create->status->download. - If a command fails, try a different provider or inform the user.
- Write image/video prompts descriptively: subject + action + environment + style + lighting.
- For TTS: write natural conversational text, not markdown. Use
--speakfor playback,-ofor file.
Speak (TTS)
rawgenai <provider> tts "<text>" --speak
| Provider | Command | Best For | Reference |
|---|---|---|---|
| OpenAI | rawgenai openai tts | General purpose, English | ref |
| Google Gemini | rawgenai google tts | Expressive storytelling, multi-speaker | ref |
| ElevenLabs | rawgenai elevenlabs tts | Most natural voices, 70+ languages | ref |
| Seed | rawgenai seed tts | Chinese, emotion-rich | ref |
| DashScope | rawgenai dashscope tts | Chinese, 10 languages, 49 voices | ref |
| MiniMax | rawgenai minimax tts | Chinese, streaming | ref |
| Kling | rawgenai kling tts | Bilingual zh/en | ref |
| Runway | rawgenai runway audio tts | Async | — |
Listen (STT)
rawgenai <provider> stt <audio-file>
| Provider | Command | Best For | Reference |
|---|---|---|---|
| OpenAI | rawgenai openai stt | Subtitles (srt/vtt) | ref |
| Google Gemini | rawgenai google stt | Speaker diarization | ref |
| ElevenLabs | rawgenai elevenlabs stt | Large files (3GB), video input | ref |
| DashScope | rawgenai dashscope stt | Chinese, emotion, long audio (12h async) | ref |
Image
rawgenai <provider> image "<prompt>" -o output.png
| Provider | Command | Best For | Reference |
|---|---|---|---|
| OpenAI | rawgenai openai image | Transparent bg, editing, multi-turn | ref |
| Google Gemini | rawgenai google image | 4K, text in image | ref |
| Grok | rawgenai grok image | Batch (up to 10) | ref |
| Seed | rawgenai seed image | 4K, multi-image fusion | ref |
| DashScope | rawgenai dashscope image | Text rendering, Chinese | ref |
| MiniMax | rawgenai minimax image | Subject reference | ref |
| Kling | rawgenai kling image | Face reference (async) | ref |
| Luma | rawgenai luma image | Creative, reframe (async) | — |
| Hunyuan | rawgenai hunyuan image | Chinese (async) | — |
| Runway | rawgenai runway image | Cinematic (async) | — |
Video
rawgenai <provider> video create "<prompt>" [flags] → status <id> → download <id> -o out.mp4
| Provider | Command | Best For | Reference |
|---|---|---|---|
| OpenAI (Sora) | rawgenai openai video | Remix | ref |
| Google (Veo) | rawgenai google video | 4K, extension | ref |
| Grok | rawgenai grok video | Quick, editing | ref |
| Seed | rawgenai seed video | Audio, wide ratios | ref |
| DashScope | rawgenai dashscope video | Character ref, multi-shot | ref |
| MiniMax (Hailuo) | rawgenai minimax video | Subject ref, director modes | ref |
| Kling | rawgenai kling video | Most advanced, element system | ref |
| Luma | rawgenai luma video | Extension, upscale | — |
| Hunyuan | rawgenai hunyuan video | Chinese | — |
| Runway | rawgenai runway video | Cinematic, character ref | — |
Music
| Provider | Command | Best For | Reference |
|---|---|---|---|
| ElevenLabs | rawgenai elevenlabs music | Prompt-based, composition plans | ref |
| MiniMax | rawgenai minimax music create | Lyrics-to-music, Chinese | ref |
Sound Effects (SFX)
| Provider | Command | Reference |
|---|---|---|
| ElevenLabs | rawgenai elevenlabs sfx "<prompt>" -o out.mp3 | ref |
| Runway | rawgenai runway audio sfx "<prompt>" | — |
Dialogue
Multi-speaker dialogue from JSON script (max 10 voices).
| Provider | Command | Reference |
|---|---|---|
| ElevenLabs | rawgenai elevenlabs dialogue -i script.json -o out.mp3 | ref |
Voice Management
Design, clone, and manage custom voices.
| Provider | Command | Capabilities | Reference |
|---|---|---|---|
| ElevenLabs | rawgenai elevenlabs voice | list, design, create, preview | ref |
| Kling | rawgenai kling voice | create, status, list, delete | ref |
| MiniMax | rawgenai minimax voice | list, upload, clone, design, delete | ref |
| Seed | rawgenai seed voice-clone | upload, status, order, renew | ref |
Audio Processing
Async: rawgenai runway audio <action> → status <id> → download <id> -o out
| Provider | Command | Capability |
|---|---|---|
| Runway | rawgenai runway audio sts | Speech-to-speech (voice conversion) |
| Runway | rawgenai runway audio dubbing | Dub audio to another language |
| Runway | rawgenai runway audio isolation | Isolate voice from background |