Image Generator

This skill generates and edits images using Google's Gemini Nano Banana Pro model (gemini-3-pro-image-preview ).

IMPORTANT: Setup Required

Before using this skill, the user must set the GEMINI_API_KEY environment variable:

Get a free API key from Google AI Studio
Export the key in your shell profile (~/.zshrc , ~/.bashrc , etc.): export GEMINI_API_KEY="your_api_key_here"
Restart your terminal or run source ~/.zshrc (or ~/.bashrc )

The skill will not work without this configuration.

Pre-flight Check

Before making any API call, verify the key is set:

if [ -z "$GEMINI_API_KEY" ]; then echo "ERROR: GEMINI_API_KEY is not set. Please export it in your shell profile." exit 1 fi

If the key is missing, stop and tell the user to set it using the instructions above.

Configuration

Model: gemini-3-pro-image-preview

API Key: Read from the GEMINI_API_KEY environment variable

Iterating on User-Provided Images

When the user provides a path to an image they want to edit or iterate on, use this workflow:

Step 1: Read and encode the image to base64

Get the image path from user

IMG_PATH="/path/to/user/image.png"

Detect mime type

if [[ "$IMG_PATH" == *.png ]]; then MIME_TYPE="image/png" elif [[ "$IMG_PATH" == *.jpg ]] || [[ "$IMG_PATH" == *.jpeg ]]; then MIME_TYPE="image/jpeg" elif [[ "$IMG_PATH" == *.webp ]]; then MIME_TYPE="image/webp" else MIME_TYPE="image/png" fi

Encode to base64 (works on both macOS and Linux)

if [[ "$(uname)" == "Darwin" ]]; then IMG_BASE64=$(base64 -i "$IMG_PATH") else IMG_BASE64=$(base64 -w0 "$IMG_PATH") fi

Step 2: Send image with edit prompt (File-Based Approach)

IMPORTANT: Always use a file-based approach for the request body. Base64-encoded images are too large for command-line arguments and will cause "argument list too long" errors.

User's edit request

EDIT_PROMPT="Add a santa hat to the person in this image"

Write request to a JSON file (avoids command line length limits)

cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$EDIT_PROMPT"}, { "inline_data": { "mime_type": "$MIME_TYPE", "data": "$IMG_BASE64" } } ] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] } } JSONEOF

Call the API using the file

Step 3: Extract and save the edited image

Extract image from response and save

python3 -c " import json import base64

with open('/tmp/gemini_response.json') as f: data = json.load(f)

for part in data['candidates'][0]['content']['parts']: if 'inlineData' in part: img_data = part['inlineData']['data'] mime = part['inlineData']['mimeType'] ext = 'png' if 'png' in mime else 'jpg' with open('edited_image.' + ext, 'wb') as out: out.write(base64.b64decode(img_data)) print(f'Saved: edited_image.{ext}') elif 'text' in part: print(part['text']) "

Complete Example (File-Based)

For iterating on images, always use file-based requests:

Variables

IMG_PATH="/path/to/image.png" EDIT_PROMPT="Make the background a sunset beach" OUTPUT_PATH="edited_output.png"

Detect mime type and encode

MIME_TYPE=$([[ "$IMG_PATH" == *.png ]] && echo "image/png" || echo "image/jpeg") IMG_BASE64=$(base64 -i "$IMG_PATH" 2>/dev/null || base64 -w0 "$IMG_PATH")

Write request to file (required - base64 images are too large for command line)

cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$EDIT_PROMPT"}, {"inline_data": {"mime_type": "$MIME_TYPE", "data": "$IMG_BASE64"}} ] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] } } JSONEOF

Call API and extract image

Save the output image

python3 -c " import json, base64 with open('/tmp/gemini_response.json') as f: data = json.load(f) for part in data.get('candidates', [{}])[0].get('content', {}).get('parts', []): if 'inlineData' in part: with open('$OUTPUT_PATH', 'wb') as f: f.write(base64.b64decode(part['inlineData']['data'])) print('Saved: $OUTPUT_PATH') "

Multi-Image Input (Combine/Compose)

To combine elements from multiple images (also uses file-based approach):

IMG1_PATH="/path/to/image1.png" IMG2_PATH="/path/to/image2.png" PROMPT="Put the dress from the first image on the person in the second image" IMG1_BASE64=$(base64 -i "$IMG1_PATH" 2>/dev/null || base64 -w0 "$IMG1_PATH") IMG2_BASE64=$(base64 -i "$IMG2_PATH" 2>/dev/null || base64 -w0 "$IMG2_PATH")

Write request to file

cat > /tmp/gemini_request.json << JSONEOF { "contents": [{ "parts": [ {"text": "$PROMPT"}, {"inline_data": {"mime_type": "image/png", "data": "$IMG1_BASE64"}}, {"inline_data": {"mime_type": "image/png", "data": "$IMG2_BASE64"}} ] }], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } JSONEOF

Capabilities

Text-to-Image Generation

Generate high-quality images from text descriptions
Support for photorealistic, stylized, and artistic outputs
Accurate text rendering in images (logos, infographics, diagrams)

Image Editing

Add or remove elements from images
Inpainting with semantic masking (edit specific parts)
Style transfer (apply artistic styles to photos)
Multi-image composition (combine elements from multiple images)

Advanced Features

High Resolution: 1K, 2K, or 4K output
Aspect Ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Google Search Grounding: Generate images based on real-time data
Multi-turn Editing: Iteratively refine images through conversation
Up to 14 Reference Images: Combine multiple inputs for complex compositions

API Usage

Basic Text-to-Image (Python)

from google import genai from google.genai import types

client = genai.Client()

response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=["Your prompt here"], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig( aspect_ratio="16:9", # Optional image_size="2K" # Optional: "1K", "2K", "4K" ) ) )

for part in response.parts: if part.text is not None: print(part.text) elif part.inline_data is not None: image = part.as_image() image.save("generated_image.png")

Basic Text-to-Image (JavaScript)

import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs";

const ai = new GoogleGenAI({});

const response = await ai.models.generateContent({ model: "gemini-3-pro-image-preview", contents: "Your prompt here", config: { responseModalities: ['TEXT', 'IMAGE'], imageConfig: { aspectRatio: "16:9", imageSize: "2K" } } });

for (const part of response.candidates[0].content.parts) { if (part.text) { console.log(part.text); } else if (part.inlineData) { const buffer = Buffer.from(part.inlineData.data, "base64"); fs.writeFileSync("generated_image.png", buffer); } }

REST API (curl)

curl -s -X POST
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent"
-H "x-goog-api-key: $GEMINI_API_KEY"
-H "Content-Type: application/json"
-d '{ "contents": [{ "parts": [{"text": "Your prompt here"}] }], "generationConfig": { "responseModalities": ["TEXT", "IMAGE"], "imageConfig": { "aspectRatio": "16:9", "imageSize": "2K" } } }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png

Image Editing (with input image)

from google import genai from google.genai import types from PIL import Image

client = genai.Client()

input_image = Image.open('input.png') prompt = "Add a wizard hat to the cat in this image"

response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[prompt, input_image], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'] ) )

for part in response.parts: if part.inline_data is not None: image = part.as_image() image.save("edited_image.png")

Multi-Image Composition

from google import genai from google.genai import types from PIL import Image

client = genai.Client()

image1 = Image.open('dress.png') image2 = Image.open('model.png') prompt = "Put the dress from the first image on the model from the second image"

response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[image1, image2, prompt], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig( aspect_ratio="3:4", image_size="2K" ) ) )

With Google Search Grounding

from google import genai from google.genai import types

client = genai.Client()

response = client.models.generate_content( model="gemini-3-pro-image-preview", contents="Visualize the current weather forecast for San Francisco", config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig(aspect_ratio="16:9"), tools=[{"google_search": {}}] ) )

Prompting Best Practices

Be Descriptive, Not Keyword-Based

Instead of: cat, wizard hat, cute

Write: A fluffy orange cat wearing a small knitted wizard hat, sitting on a wooden floor with soft natural lighting from a window

Specify Style and Mood

Photography terms: "shot with 85mm lens", "soft bokeh background", "golden hour lighting"
Artistic styles: "in the style of Van Gogh", "minimalist illustration", "photorealistic"
Mood: "warm and cozy atmosphere", "dramatic noir lighting"

For Text in Images

Be explicit about:

The exact text to render
Font style (descriptively): "clean, bold, sans-serif font"
Placement and size

For Editing

Describe what to change and what to preserve
Use "keep everything else unchanged"
Reference specific elements clearly

For Product/Commercial Images

Mention:

Lighting setup: "three-point softbox lighting"
Background: "clean white studio background"
Camera angle: "slightly elevated 45-degree shot"

Resolution and Aspect Ratio Reference

Aspect Ratio 1K Resolution 2K Resolution 4K Resolution

1:1 1024x1024 2048x2048 4096x4096

16:9 1376x768 2752x1536 5504x3072

9:16 768x1376 1536x2752 3072x5504

3:2 1264x848 2528x1696 5056x3392

2:3 848x1264 1696x2528 3392x5056

Common Use Cases

Logo Creation

Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. Black and white color scheme. Put the logo in a circle.

Product Photography

A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black on a polished concrete surface. Three-point softbox lighting with soft, diffused highlights. Slightly elevated 45-degree camera angle. Sharp focus on steam rising from the coffee.

Style Transfer

Transform this photograph of a city street at night into Vincent van Gogh's 'Starry Night' style. Preserve the composition but render with swirling, impasto brushstrokes and deep blues with bright yellows.

Infographic

Create a vibrant infographic explaining photosynthesis as a recipe. Show "ingredients" (sunlight, water, CO2) and "finished dish" (sugar/energy). Style like a colorful kids' cookbook, suitable for 4th graders.

Error Handling

Common issues:

No image returned: Check that response_modalities includes 'IMAGE'
Safety filters: Some prompts may be blocked; try rephrasing
Rate limits: Implement exponential backoff for retries
Large images: For 4K, ensure sufficient timeout settings

Dependencies

To use the Python SDK:

pip install google-genai pillow

For JavaScript:

npm install @google/genai

Important Notes

All generated images include a SynthID watermark
The model uses a "thinking" process for complex prompts
For best text rendering, generate text first, then request image with that text
Images are not stored by the API - save outputs locally

image generator

Safety Notice

Copy this and send it to your AI assistant to learn

Get the image path from user

Detect mime type

Encode to base64 (works on both macOS and Linux)

User's edit request

Write request to a JSON file (avoids command line length limits)

Call the API using the file

Extract image from response and save

Variables

Detect mime type and encode

Write request to file (required - base64 images are too large for command line)

Call API and extract image

Save the output image

Write request to file

Source Transparency

Related Skills

Yollomi AI Image Generator (All Models)

image-generator

image-generator