π¨ Image Generation Skill
Use when: User asks to generate, draw, create, or make any kind of image, photo, illustration, icon, logo, or artwork.
Generate images with 8 state-of-the-art AI models. This skill automatically picks the best model for the job and handles all the complexity β including Midjourney's async polling β so you can focus on the conversation.
Quick Reference
User Intent Model Speed
Artistic, cinematic, painterly midjourney
~15s
Photorealistic, portrait, product flux-pro
~8s
General purpose, balanced flux-dev
~10s
Quick draft, fast iteration flux-schnell
~2s
Image with text, logo, poster ideogram
~10s
Vector art, icon, flat design recraft
~8s
Anime, stylized illustration sdxl
~5s
Gemini-powered, consistent style nano-banana
~12s
How to Generate an Image
Step 1 β Enhance the prompt
Before calling the script, expand the user's prompt with style, lighting, and quality descriptors appropriate for the chosen model.
-
Midjourney: Add cinematic lighting , ultra detailed , --v 7 , --style raw
-
Flux: Add masterpiece , highly detailed , sharp focus , professional photography
-
Ideogram: Be explicit about text content, font style, and layout
-
Recraft: Specify vector illustration , flat design , icon style
Step 2 β Run the script
node {baseDir}/tools/generate.js
--model <model_id>
--prompt "<enhanced prompt>"
--aspect-ratio <ratio>
All parameters:
Parameter Default Description
--model
flux-dev
Model ID from the table above
--prompt
(required) The image generation prompt
--aspect-ratio
1:1
1:1 , 16:9 , 9:16 , 4:3 , 3:4 , 3:2 , 21:9
--num-images
1
Number of images (1β4; Midjourney always returns 4)
--negative-prompt
β Things to avoid (not supported by Midjourney)
--seed
β Seed for reproducibility
Step 3 β Return the result
The script always waits and returns the final image URL(s). No polling required.
{ "success": true, "model": "flux-pro", "imageUrl": "https://...", "images": ["https://..."] }
Send the imageUrl to the user.
Midjourney Actions
After generating a 4-image grid with Midjourney, offer the user these options:
Upscale image #2 (subtle, preserves details)
node {baseDir}/tools/generate.js
--model midjourney
--action upscale
--index 2
--job-id <job_id>
Create a strong variation of image #3
node {baseDir}/tools/generate.js
--model midjourney
--action variation
--index 3
--job-id <job_id>
--variation-type 1
Regenerate with same prompt
node {baseDir}/tools/generate.js
--model midjourney
--action reroll
--job-id <job_id>
Upscale types: 0 = Subtle (default, best for photos), 1 = Creative (best for illustrations)
Variation types: 0 = Subtle (default), 1 = Strong (dramatic changes)
Example Conversations
User: "Draw a snow leopard on a snowy mountain with cinematic lighting"
Choose midjourney for artistic quality
node {baseDir}/tools/generate.js
--model midjourney
--prompt "a majestic snow leopard on a snowy mountain peak, cinematic lighting, dramatic atmosphere, ultra detailed --ar 16:9 --v 7"
--aspect-ratio 16:9
π¨ Done! Which one to upscale? (U1-U4) Or create a variant? (V1-V4)
User: "Use Flux to generate a perfume product poster, white background"
Choose flux-pro for photorealistic product shots
node {baseDir}/tools/generate.js
--model flux-pro
--prompt "a luxury perfume bottle on a clean white background, professional product photography, soft shadows, 8k, highly detailed"
--aspect-ratio 3:4
User: "Show me a quick draft"
flux-schnell for instant previews
node {baseDir}/tools/generate.js
--model flux-schnell
--prompt "..."
--aspect-ratio 1:1
User: "Make me an App icon, flat style, blue theme"
recraft for vector/icon style
node {baseDir}/tools/generate.js
--model recraft
--prompt "a minimal flat design app icon, blue color scheme, simple geometric shapes, vector style, white background"
Setup
Zero API keys needed! All requests go through a hosted proxy that handles authentication server-side.
The skill works out of the box β just install and use.
Advanced: Custom proxy or token
If you want to use your own proxy or a persistent token, set these environment variables:
{ "skills": { "entries": { "image-studio": { "enabled": true, "env": { "IMAGE_STUDIO_PROXY_URL": "https://your-proxy.vercel.app", "IMAGE_STUDIO_TOKEN": "your_token_here" } } } } }
Variable Required Description
IMAGE_STUDIO_PROXY_URL
No Custom proxy base URL (default: https://image-gen-proxy.vercel.app )
IMAGE_STUDIO_TOKEN
No Persistent token (auto-obtained if not set, 100 free uses per token)
To deploy your own proxy, see the audiomind proxy as a reference implementation. You'll need FAL_KEY and LEGNEXT_KEY as Vercel environment variables.
Changelog
v2.0.0
-
Simplified async: The script now blocks until Midjourney completes. No more --async / --poll flags needed in SKILL.md instructions.
-
Unified output format: All models return the same { success, imageUrl, images } shape.
-
Reference images for Nano Banana: Pass --reference-images "url1,url2" for character/style consistency across generations.
v1.3.0
- Added non-blocking async mode for Midjourney (--async
- --poll ).
v1.2.0
- Midjourney turbo mode enabled by default (~10-20s).
v1.1.0
- Switched Midjourney provider from TTAPI to Legnext.ai for better stability.
v1.0.0
- Initial release with Midjourney, Flux, SDXL, Nano Banana, Ideogram, Recraft.