Image Prompt Generator
Generate professional, non-generic images using Google's Gemini API for image generation.
Prerequisites & Setup
Getting Your Gemini API Key
-
Go to Google AI Studio
-
Sign in with your Google account
-
Click "Create API Key"
-
Copy the generated key
Configuring the API Key
Option 1: Environment file (recommended)
Create a .env file in your project root:
GEMINI_API_KEY=your_api_key_here
Option 2: Direct environment variable
export GEMINI_API_KEY=your_api_key_here
Install Dependencies
pip install google-generativeai python-dotenv pillow
Available Models
Model API Name Best For
Flash gemini-2.5-flash-image
Speed, drafts, iteration
Pro gemini-3-pro-image-preview
Final assets, 16:9 aspect ratio, quality
Image Size (CRITICAL for quality)
Size Output Use For
1K ~400KB Drafts, iteration
2K ~2MB Thumbnails, web assets (default)
4K ~7MB Print, high-resolution needs
CRITICAL: Always specify --size 2K or --size 4K for final assets. Without this, output defaults to 1K which looks low-quality.
Skill Stack Thumbnails - MANDATORY SETTINGS
For ANY Skill Stack thumbnail, you MUST use:
python scripts/images/generate_image.py "YOUR PROMPT"
--model pro
--size 2K
--aspect 16:9
--output public/images/thumbnails
--name your-slug
And your prompt MUST include risograph style instructions:
STYLE: Risograph / screen print aesthetic. Visible halftone dots throughout. Slight misregistration between color layers. Indie printmaking quality - warm, tactile, handmade.
COLOR: Warm cream base. Charcoal and gray tones. Terracotta (#D4654A) accent on ONE key element.
TEXTURE: Visible halftone pattern. Paper grain. NOT smooth digital gradients.
AVOID: Empty white borders, photorealism, glossy renders, smooth gradients, clip art aesthetic.
WHY: Without --size 2K , output is ~400KB garbage that looks like it was made with Flash. Without risograph style, output looks generic and AI-generated.
Workflow Overview
-
Brainstorm Concepts - Generate 4-6 high-level visual ideas
-
Select Direction - User picks the concept they like
-
Optimize Prompt - Refine into a strong, detailed prompt
-
Style Variations - Adapt to 2-3 different visual styles
-
Generate Images - Run via Gemini API
Step 1: Brainstorm Concepts
When the user provides a topic or use case, generate 3 high-level visual concepts and indicate your pick. Each concept should be:
-
One sentence describing the visual idea
-
Concrete and immediate - you can picture it instantly
-
Conceptual but not abstract - a clear object/scene with meaning
-
Non-generic - avoid cliches (no lightbulbs for ideas, no handshakes for partnership)
Format:
Concept A: One sentence description of the visual concept and why it works.
Concept B: One sentence description...
Concept C: One sentence description...
My pick: Concept X — Brief reasoning why this one is strongest.
Example for "newsletter about personal productivity":
Concept A: A vintage compass where the needle points toward a coffee ring stain on a map—direction emerges from daily rituals.
Concept B: A clock where the 12 hours show seasonal changes—time management over long arcs, not just hours.
Concept C: One small key attached to dozens of decorative keychains—we overcomplicate simple solutions.
My pick: Concept A — The coffee stain is unexpected and personal, the compass is concrete without being cliché.
Wait for user to confirm or redirect before proceeding.
Step 2: Optimize the Prompt
Once the user selects a concept, develop it into a full prompt. Structure:
Create a [style type] illustration of [subject].
CONCEPT: [Expand the one-sentence idea into a clear visual description]
STYLE: [Artistic approach - load from references/styles/ if brand-specific]
COMPOSITION: [Framing, focal point, negative space, balance]
COLORS: [Palette - describe by name, not hex codes which may render as text]
TEXTURE: [Surface qualities, analog/digital feel]
AVOID: [What should NOT appear - be specific]
FORMAT: [Aspect ratio]
Key principles:
-
Natural language, full sentences - no tag soup
-
Describe colors by name (burnt orange, sky blue, near-black) not hex codes
-
Maximum 2-3 elements - if it feels busy, remove something
-
Favor metaphor over literal depiction
Step 3: Style Variations
MANDATORY for Skill Stack: Risograph style. Every thumbnail must include risograph style instructions in the prompt. See references/styles/risograph.md .
Key risograph elements to ALWAYS include in prompt:
-
"Visible halftone dots throughout"
-
"Slight misregistration between color layers"
-
"Indie printmaking quality - warm, tactile, handmade"
-
"NOT smooth digital gradients"
Available styles in references/styles/ (for non-Skill-Stack projects):
-
risograph.md - DEFAULT. Halftone dots, misregistration, indie printmaking aesthetic.
-
minimalist-ink.md - High-contrast black and white, crosshatching.
-
watercolor-line.md - Ink linework with watercolor washes.
-
editorial-conceptual.md - Conceptual, sophisticated, editorial wit.
Step 4: Generate via API
Running the Script
Generate 2K thumbnail (recommended for web)
python scripts/images/generate_image.py "prompt here" --model pro --aspect 16:9 --size 2K
Generate 4K for print/high-res
python scripts/images/generate_image.py "prompt here" --model pro --size 4K
Save to specific folder
python scripts/images/generate_image.py "prompt" --output "./images" --name "my_image" --size 2K
Options:
-
--model pro (default, higher quality) or --model flash (faster)
-
--size 2K (default), 1K , or 4K
-
always use 2K or 4K for final assets
-
--aspect 16:9 (default), 1:1 , 9:16 , 3:4 , 4:3
-
--variations N
-
generate N versions
-
--output ./path
-
save location
-
--name prefix
-
filename prefix
Output location: Save images alongside the content they belong to - not a generic images dump.
Step 5: Iterate
After user reviews generated images:
-
80% good? Request specific edits conversationally rather than regenerating
-
Composition off? Adjust framing or element placement in prompt
-
Wrong style? Try a different style reference
-
Too busy? Simplify to fewer elements
-
Colors wrong? Be more explicit about palette
Prompting Principles
Write Like a Creative Director
Brief the model like a human artist. Use proper grammar, full sentences, and descriptive adjectives.
Don't Do
"Cool car, neon, city, night, 8k" "A cinematic wide shot of a futuristic sports car speeding through a rainy Tokyo street at night. The neon signs reflect off the wet pavement and the car's metallic chassis."
Be specific about:
-
Subject: Instead of "a woman," say "a sophisticated elderly woman wearing a vintage chanel-style suit"
-
Materiality: Describe textures - "matte finish," "brushed steel," "soft velvet," "crumpled paper"
-
Setting: Define location, time of day, weather
-
Lighting: Specify mood and light source
-
Mood: Emotional tone of the image
Provide Context
Context helps the model make logical artistic decisions. Include the "why" or "for whom."
Example: "Create an image of a sandwich for a Brazilian high-end gourmet cookbook." (Model infers: professional plating, shallow depth of field, perfect lighting)
Keep It Simple
-
One clear focal point
-
Maximum 2-3 elements total
-
Generous negative space
-
If it feels busy, remove something
Avoid the Generic
Hard rules:
-
NEVER include text in images - Text renders poorly and looks amateur
-
No lightbulbs for "ideas"
-
No handshakes for "partnership"
-
No audio waveforms for "podcasts" or "audio"
-
No gears/cogs for "systems" or "process"
-
No happy stock photo poses
-
No glossy AI aesthetic
-
No puzzle pieces for "connection" or "integration"
Metaphors That Work
Strong concepts from past generations:
-
Orange cut open revealing neural network - organic exterior, systematic interior (knowledge systems, RAG)
-
Mail slot with origami bird emerging - delivery transforms into something alive (newsletters)
-
Exercise club crossed with quill pen - physical culture meets intellectual work (body + mind)
-
Typewriter key extreme close-up with radiating impact - decisive action, command (books, authority)
-
Medicine dropper with mandala bloom - clinical precision meets transcendence (therapy, transformation)
-
Straight razor on leather strop - precision tool, polishing, refinement (editing, production)
-
Prism dispersing light into distinct beams - one input, multiple outputs (frameworks)
-
Compass with coffee stain on map - direction from daily rituals (productivity)
Resources
references/styles/
Aesthetic style definitions:
-
risograph.md
-
DEFAULT - Halftone, misregistration, indie printmaking
-
minimalist-ink.md
-
Black and white ink illustration
-
watercolor-line.md
-
Ink with watercolor washes
-
editorial-conceptual.md
-
Conceptual editorial style
scripts/
- generate_image.py
- Gemini API image generation
Prompt Modifiers Reference
Category Examples
Lighting golden hour, dramatic shadows, soft diffused light, neon glow, overcast
Style cinematic, editorial, technical diagram, hand-drawn, photorealistic
Texture matte finish, brushed steel, soft velvet, crumpled paper, weathered wood
Composition wide shot, close-up, bird's eye view, dutch angle, symmetrical
Mood energetic, serene, dramatic, playful, sophisticated
Quality 4K, high-fidelity, pixel-perfect, professional grade
Advanced Capabilities
Text Rendering & Infographics
Put exact text in quotes. Specify style: "polished editorial," "technical diagram," or "hand-drawn whiteboard."
Example prompts:
Earnings Report Infographic: "Generate a clean, modern infographic summarizing the key financial highlights from this earnings report. Include charts for 'Revenue Growth' and 'Net Income', and highlight the CEO's key quote in a stylized pull-quote box."
Whiteboard Summary: "Summarize the concept of 'Transformer Neural Network Architecture' as a hand-drawn whiteboard diagram suitable for a university lecture. Use different colored markers for the Encoder and Decoder blocks, and include legible labels for 'Self-Attention' and 'Feed Forward'."
Character Consistency & Thumbnails
Use reference images and state "Keep the person's facial features exactly the same as Image 1." Describe expression/action changes while maintaining identity.
Example prompt:
Viral Thumbnail: "Design a viral video thumbnail using the person from Image 1. Face Consistency: Keep the person's facial features exactly the same as Image 1, but change their expression to look excited and surprised. Action: Pose the person on the left side, pointing their finger towards the right side of the frame. Subject: On the right side, place a high-quality image of a delicious avocado toast. Graphics: Add a bold yellow arrow connecting the person's finger to the toast. Text: Overlay massive, pop-style text in the middle: 'Done in 3 mins!'. Use a thick white outline and drop shadow. Background: A blurred, bright kitchen background. High saturation and contrast."
Image Reworking (Edit Existing Images)
The --input flag enables "rework mode" - pass an existing image to Gemini and describe the changes you want.
Key use cases:
-
Small tweaks - Adjust colors, add/remove elements, change lighting
-
Style transfer - Keep composition but change artistic style
-
Object manipulation - Remove, add, or modify specific objects
-
Seasonal/temporal changes - Same scene, different time/season
Running in rework mode:
Basic edit - add something
python scripts/generate_image.py "Add snow to the roof and yard"
--input ./house.png
--model pro
Color adjustment
python scripts/generate_image.py "Change the accent color from red to teal, keep everything else identical"
--input ./thumbnail.png
--model pro
Style transfer - keep composition, change aesthetic
python scripts/generate_image.py "Convert this to risograph style with halftone dots and slight color misregistration"
--input ./photo.png
--model pro
Generate variations of an edit
python scripts/generate_image.py "Make the lighting warmer, like golden hour"
--input ./portrait.png
--variations 3
--model pro
Prompting tips for rework mode:
Be specific about what to preserve:
-
"Keep the person's facial features exactly the same"
-
"Maintain the composition and framing"
-
"Don't change the background"
Be explicit about what to change:
-
"Change ONLY the color of the shirt from blue to red"
-
"Add snow to the roof and nothing else"
-
"Remove the text overlay"
Use comparative language:
-
"Make the colors more vibrant"
-
"Increase the contrast slightly"
-
"Make the lighting softer and more diffused"
Output naming: Files from rework mode are named {prefix}_{timestamp}edit{model}.png to distinguish from generated images (gen ).
Advanced Editing Examples
Object Removal:
python scripts/generate_image.py
"Remove the tourists from the background and fill with matching cobblestones and storefronts"
--input ./street-photo.png
--model pro
Seasonal Control:
python scripts/generate_image.py
"Turn this into winter. Add snow to the roof and yard. Change lighting to cold, overcast afternoon. Keep architecture identical."
--input ./house-summer.png
--model pro
Character Consistency (thumbnail series):
python scripts/generate_image.py
"Keep the person's face exactly the same. Change expression to surprised. Add a pointing gesture toward the right side of the frame."
--input ./person-reference.png
--model pro
Related Skills
-
youtube-title-creator - Pair generated images with optimized titles
-
social-content-creation - Use images in platform-optimized posts
For custom brand styles, create new style files in references/styles/ following the existing format