Gemini Image Generation Skill
Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.
When to Use This Skill
Use this skill when you need to:
-
Generate images from text descriptions
-
Edit existing images by adding/removing elements or changing styles
-
Combine multiple source images into new compositions
-
Iteratively refine images through conversational editing
-
Create visual content for documentation, design, or creative projects
Prerequisites
API Key Setup
The skill supports both Google AI Studio and Vertex AI endpoints.
Option 1: Google AI Studio (Default)
The skill automatically detects your GEMINI_API_KEY in this order:
-
Process environment: export GEMINI_API_KEY="your-key"
-
Project root: .env
-
.claude directory: .claude/.env
-
.claude/skills directory: .claude/skills/.env
-
Skill directory: .claude/skills/gemini-image-gen/.env
Get your API key: Visit Google AI Studio
Create .env file with:
GEMINI_API_KEY=your_api_key_here
Option 2: Vertex AI
To use Vertex AI instead:
Enable Vertex AI
export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1
Or in .env file:
GEMINI_USE_VERTEX=true VERTEX_PROJECT_ID=your-gcp-project-id VERTEX_LOCATION=us-central1
Python Setup
Install required package:
pip install google-genai
Quick Start
Basic Text-to-Image Generation
from google import genai from google.genai import types import os
API key detection handled automatically by helper script
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content( model='gemini-2.5-flash-image', contents='A serene mountain landscape at sunset with snow-capped peaks', config=types.GenerateContentConfig( response_modalities=['image'], aspect_ratio='16:9' ) )
Save to ./docs/assets/
for i, part in enumerate(response.candidates[0].content.parts): if part.inline_data: with open(f'./docs/assets/generated-{i}.png', 'wb') as f: f.write(part.inline_data.data)
Using the Helper Script
For convenience, use the provided helper script that handles API key detection and file saving:
Generate single image
python .claude/skills/gemini-image-gen/scripts/generate.py
"A futuristic city with flying cars"
--aspect-ratio 16:9
--output ./docs/assets/city.png
Generate with specific modalities
python .claude/skills/gemini-image-gen/scripts/generate.py
"Modern architecture design"
--response-modalities image text
--aspect-ratio 1:1
Key Features
Aspect Ratios
Ratio Resolution Use Case Token Cost
1:1 1024×1024 Social media, avatars 1290
16:9 1344×768 Landscapes, banners 1290
9:16 768×1344 Mobile, portraits 1290
4:3 1152×896 Traditional media 1290
3:4 896×1152 Vertical posters 1290
Response Modalities
-
['image'] : Generate only images
-
['text'] : Generate only text descriptions
-
['image', 'text'] : Generate both images and descriptions
Image Editing
Provide existing image + text instructions to modify:
import PIL.Image
img = PIL.Image.open('original.png') response = client.models.generate_content( model='gemini-2.5-flash-image', contents=[ 'Add a red balloon floating in the sky', img ] )
Multi-Image Composition
Combine up to 3 source images (recommended):
img1 = PIL.Image.open('background.png') img2 = PIL.Image.open('foreground.png')
response = client.models.generate_content( model='gemini-2.5-flash-image', contents=[ 'Combine these images into a cohesive scene', img1, img2 ] )
Prompt Engineering Tips
Structure effective prompts with three elements:
-
Subject: What to generate ("a robot")
-
Context: Environmental setting ("in a futuristic city")
-
Style: Artistic treatment ("cyberpunk style, neon lighting")
Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"
Quality modifiers:
-
Add terms like "4K", "HDR", "high-quality", "professional photography"
-
Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"
Text in images:
-
Limit to 25 characters maximum
-
Use up to 3 distinct phrases
-
Specify font styles: "bold sans-serif title" or "handwritten script"
See references/prompting-guide.md for comprehensive prompt engineering strategies.
Safety Settings
The model includes adjustable safety filters. Configure per-request:
config = types.GenerateContentConfig( response_modalities=['image'], safety_settings=[ types.SafetySetting( category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH, threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE ) ] )
See references/safety-settings.md for detailed configuration options.
Output Management
All generated images should be saved to ./docs/assets/ directory:
Create directory if needed
mkdir -p ./docs/assets
The helper script automatically saves to this location with timestamped filenames.
Model Specifications
Model: gemini-2.5-flash-image
-
Input tokens: Up to 65,536
-
Output tokens: Up to 32,768
-
Supported inputs: Text and images
-
Supported outputs: Text and images
-
Knowledge cutoff: June 2025
-
Features: Image generation, structured outputs, batch API, caching
Limitations
-
Maximum 3 input images recommended for best results
-
Text rendering works best when generated separately first
-
Does not support audio/video inputs
-
Regional restrictions on child image uploads (EEA, CH, UK)
-
Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi
Error Handling
Common issues and solutions:
API key not found:
Check environment variables
echo $GEMINI_API_KEY
Verify .env file exists
cat .claude/skills/gemini-image-gen/.env
or
cat .env
Safety filter blocking:
-
Review response.prompt_feedback.block_reason
-
Adjust safety settings if appropriate for your use case
-
Modify prompt to avoid triggering filters
Token limit exceeded:
-
Reduce prompt length
-
Use fewer input images
-
Simplify image editing instructions
Reference Documentation
For detailed information, see:
-
references/api-reference.md
-
Complete API specifications
-
references/prompting-guide.md
-
Advanced prompt engineering
-
references/safety-settings.md
-
Safety configuration details
-
references/code-examples.md
-
Additional implementation examples
Resources
-
Official Documentation
-
API Reference
-
Get API Key
-
Google AI Studio - Interactive testing