Image Generator
This skill generates and edits images using Google's Gemini Nano Banana Pro model (gemini-3-pro-image-preview ).
IMPORTANT: Setup Required
Before using this skill, the user must set the GEMINI_API_KEY environment variable:
-
Get a free API key from Google AI Studio
-
Export the key in your shell profile (~/.zshrc , ~/.bashrc , etc.): export GEMINI_API_KEY="your_api_key_here"
-
Restart your terminal or run source ~/.zshrc (or ~/.bashrc )
The skill will not work without this configuration.
Pre-flight Check
Before making any API call, verify the key is set:
if [ -z "$GEMINI_API_KEY" ]; then echo "ERROR: GEMINI_API_KEY is not set. Please export it in your shell profile." exit 1 fi
If the key is missing, stop and tell the user to set it using the instructions above.
Configuration
Model: gemini-3-pro-image-preview
API Key: Read from the GEMINI_API_KEY environment variable
Iterating on User-Provided Images
When the user provides a path to an image they want to edit or iterate on, use this workflow:
Step 1: Read and encode the image to base64
Get the image path from user
IMG_PATH="/path/to/user/image.png"
Detect mime type
if [[ "$IMG_PATH" == *.png ]]; then MIME_TYPE="image/png" elif [[ "$IMG_PATH" == *.jpg ]] || [[ "$IMG_PATH" == *.jpeg ]]; then MIME_TYPE="image/jpeg" elif [[ "$IMG_PATH" == *.webp ]]; then MIME_TYPE="image/webp" else MIME_TYPE="image/png" fi
Encode to base64 (works on both macOS and Linux)
if [[ "$(uname)" == "Darwin" ]]; then IMG_BASE64=$(base64 -i "$IMG_PATH") else IMG_BASE64=$(base64 -w0 "$IMG_PATH") fi
Step 2: Send image with edit prompt (File-Based Approach)
IMPORTANT: Always use a file-based approach for the request body. Base64-encoded images are too large for command-line arguments and will cause "argument list too long" errors.
User's edit request
EDIT_PROMPT="Add a santa hat to the person in this image"
Write request to a JSON file (avoids command line length limits)
cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$EDIT_PROMPT"}, { "inline_data": { "mime_type": "$MIME_TYPE", "data": "$IMG_BASE64" } } ] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] } } JSONEOF
Call the API using the file
curl -s -X POST
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent"
-H "x-goog-api-key: $GEMINI_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/gemini_request.json > /tmp/gemini_response.json
Step 3: Extract and save the edited image
Extract image from response and save
python3 -c " import json import base64
with open('/tmp/gemini_response.json') as f: data = json.load(f)
for part in data['candidates'][0]['content']['parts']: if 'inlineData' in part: img_data = part['inlineData']['data'] mime = part['inlineData']['mimeType'] ext = 'png' if 'png' in mime else 'jpg' with open('edited_image.' + ext, 'wb') as out: out.write(base64.b64decode(img_data)) print(f'Saved: edited_image.{ext}') elif 'text' in part: print(part['text']) "
Complete Example (File-Based)
For iterating on images, always use file-based requests:
Variables
IMG_PATH="/path/to/image.png" EDIT_PROMPT="Make the background a sunset beach" OUTPUT_PATH="edited_output.png"
Detect mime type and encode
MIME_TYPE=$([[ "$IMG_PATH" == *.png ]] && echo "image/png" || echo "image/jpeg") IMG_BASE64=$(base64 -i "$IMG_PATH" 2>/dev/null || base64 -w0 "$IMG_PATH")
Write request to file (required - base64 images are too large for command line)
cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$EDIT_PROMPT"}, {"inline_data": {"mime_type": "$MIME_TYPE", "data": "$IMG_BASE64"}} ] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] } } JSONEOF
Call API and extract image
curl -s -X POST
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent"
-H "x-goog-api-key: $GEMINI_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/gemini_request.json > /tmp/gemini_response.json
Save the output image
python3 -c " import json, base64 with open('/tmp/gemini_response.json') as f: data = json.load(f) for part in data.get('candidates', [{}])[0].get('content', {}).get('parts', []): if 'inlineData' in part: with open('$OUTPUT_PATH', 'wb') as f: f.write(base64.b64decode(part['inlineData']['data'])) print('Saved: $OUTPUT_PATH') "
Multi-Image Input (Combine/Compose)
To combine elements from multiple images (also uses file-based approach):
IMG1_PATH="/path/to/image1.png" IMG2_PATH="/path/to/image2.png" PROMPT="Put the dress from the first image on the person in the second image" IMG1_BASE64=$(base64 -i "$IMG1_PATH" 2>/dev/null || base64 -w0 "$IMG1_PATH") IMG2_BASE64=$(base64 -i "$IMG2_PATH" 2>/dev/null || base64 -w0 "$IMG2_PATH")
Write request to file
cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$PROMPT"}, {"inline_data": {"mime_type": "image/png", "data": "$IMG1_BASE64"}}, {"inline_data": {"mime_type": "image/png", "data": "$IMG2_BASE64"}} ] }], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } JSONEOF
curl -s -X POST
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent"
-H "x-goog-api-key: $GEMINI_API_KEY"
-H "Content-Type: application/json"
-d @/tmp/gemini_request.json > /tmp/gemini_response.json
Capabilities
Text-to-Image Generation
-
Generate high-quality images from text descriptions
-
Support for photorealistic, stylized, and artistic outputs
-
Accurate text rendering in images (logos, infographics, diagrams)
Image Editing
-
Add or remove elements from images
-
Inpainting with semantic masking (edit specific parts)
-
Style transfer (apply artistic styles to photos)
-
Multi-image composition (combine elements from multiple images)
Advanced Features
-
High Resolution: 1K, 2K, or 4K output
-
Aspect Ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
-
Google Search Grounding: Generate images based on real-time data
-
Multi-turn Editing: Iteratively refine images through conversation
-
Up to 14 Reference Images: Combine multiple inputs for complex compositions
API Usage
Basic Text-to-Image (Python)
from google import genai from google.genai import types
client = genai.Client()
response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=["Your prompt here"], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig( aspect_ratio="16:9", # Optional image_size="2K" # Optional: "1K", "2K", "4K" ) ) )
for part in response.parts: if part.text is not None: print(part.text) elif part.inline_data is not None: image = part.as_image() image.save("generated_image.png")
Basic Text-to-Image (JavaScript)
import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs";
const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({ model: "gemini-3-pro-image-preview", contents: "Your prompt here", config: { responseModalities: ['TEXT', 'IMAGE'], imageConfig: { aspectRatio: "16:9", imageSize: "2K" } } });
for (const part of response.candidates[0].content.parts) { if (part.text) { console.log(part.text); } else if (part.inlineData) { const buffer = Buffer.from(part.inlineData.data, "base64"); fs.writeFileSync("generated_image.png", buffer); } }
REST API (curl)
curl -s -X POST
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent"
-H "x-goog-api-key: $GEMINI_API_KEY"
-H "Content-Type: application/json"
-d '{
"contents": [{
"parts": [{"text": "Your prompt here"}]
}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png
Image Editing (with input image)
from google import genai from google.genai import types from PIL import Image
client = genai.Client()
input_image = Image.open('input.png') prompt = "Add a wizard hat to the cat in this image"
response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[prompt, input_image], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'] ) )
for part in response.parts: if part.inline_data is not None: image = part.as_image() image.save("edited_image.png")
Multi-Image Composition
from google import genai from google.genai import types from PIL import Image
client = genai.Client()
image1 = Image.open('dress.png') image2 = Image.open('model.png') prompt = "Put the dress from the first image on the model from the second image"
response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[image1, image2, prompt], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig( aspect_ratio="3:4", image_size="2K" ) ) )
With Google Search Grounding
from google import genai from google.genai import types
client = genai.Client()
response = client.models.generate_content( model="gemini-3-pro-image-preview", contents="Visualize the current weather forecast for San Francisco", config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig(aspect_ratio="16:9"), tools=[{"google_search": {}}] ) )
Prompting Best Practices
- Be Descriptive, Not Keyword-Based
Instead of: cat, wizard hat, cute
Write: A fluffy orange cat wearing a small knitted wizard hat, sitting on a wooden floor with soft natural lighting from a window
- Specify Style and Mood
-
Photography terms: "shot with 85mm lens", "soft bokeh background", "golden hour lighting"
-
Artistic styles: "in the style of Van Gogh", "minimalist illustration", "photorealistic"
-
Mood: "warm and cozy atmosphere", "dramatic noir lighting"
- For Text in Images
Be explicit about:
-
The exact text to render
-
Font style (descriptively): "clean, bold, sans-serif font"
-
Placement and size
- For Editing
-
Describe what to change and what to preserve
-
Use "keep everything else unchanged"
-
Reference specific elements clearly
- For Product/Commercial Images
Mention:
-
Lighting setup: "three-point softbox lighting"
-
Background: "clean white studio background"
-
Camera angle: "slightly elevated 45-degree shot"
Resolution and Aspect Ratio Reference
Aspect Ratio 1K Resolution 2K Resolution 4K Resolution
1:1 1024x1024 2048x2048 4096x4096
16:9 1376x768 2752x1536 5504x3072
9:16 768x1376 1536x2752 3072x5504
3:2 1264x848 2528x1696 5056x3392
2:3 848x1264 1696x2528 3392x5056
Common Use Cases
Logo Creation
Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. Black and white color scheme. Put the logo in a circle.
Product Photography
A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black on a polished concrete surface. Three-point softbox lighting with soft, diffused highlights. Slightly elevated 45-degree camera angle. Sharp focus on steam rising from the coffee.
Style Transfer
Transform this photograph of a city street at night into Vincent van Gogh's 'Starry Night' style. Preserve the composition but render with swirling, impasto brushstrokes and deep blues with bright yellows.
Infographic
Create a vibrant infographic explaining photosynthesis as a recipe. Show "ingredients" (sunlight, water, CO2) and "finished dish" (sugar/energy). Style like a colorful kids' cookbook, suitable for 4th graders.
Error Handling
Common issues:
-
No image returned: Check that response_modalities includes 'IMAGE'
-
Safety filters: Some prompts may be blocked; try rephrasing
-
Rate limits: Implement exponential backoff for retries
-
Large images: For 4K, ensure sufficient timeout settings
Dependencies
To use the Python SDK:
pip install google-genai pillow
For JavaScript:
npm install @google/genai
Important Notes
-
All generated images include a SynthID watermark
-
The model uses a "thinking" process for complex prompts
-
For best text rendering, generate text first, then request image with that text
-
Images are not stored by the API - save outputs locally