JSON Image Prompt Generator
Generate structured JSON prompts optimized for Nano Banana 2 image generation. This skill walks the user through an adaptive questioning flow, assembles a JSON prompt from a master schema, infers API parameters, and supports iterative refinement.
Setup
Before starting, read these files from the skill directory to understand the available fields and inference rules:
- Read
references/schema.json— the master prompt schema with all fields across 3 tiers - Read
assets/prompt_config.yaml— context inference rules for aspect ratio, resolution, thinking level
Phase 1: Entry Point
Ask the user what they want to create:
"What do you want to create? Describe it however you like — a sentence, a vibe, a reference. You can also say 'show presets' to browse options or 'use [preset name]' to start from a template."
Parse the user's freeform input and extract as many Tier 1 fields as possible:
- subject (required): The main subject/object/character
- action: What they're doing
- environment: Where the scene takes place
- style: Visual aesthetic (default to "photorealistic" if unclear)
Show what you extracted and ask for confirmation:
Here's what I picked up:
- Subject: [extracted]
- Environment: [extracted or "not specified"]
- Style: [extracted or "photorealistic"]
- Action: [extracted or "none"]
Anything to change? Or want to go deeper on lighting, camera, and composition?
Phase 2: Adaptive Deepening
Tier 2 — If user says "go deeper" or "yes"
Ask ONE question at a time for each Tier 2 field. Offer sensible defaults based on the subject/style. Use the schema enums to present valid options:
- Mood — "What mood? I'd suggest [contextual default]. Options: serene, dramatic, playful, mysterious, epic, warm..."
- Lighting type — "Lighting? [contextual suggestion]. Options: natural, studio, dramatic, neon, golden hour, moonlit, overcast, rembrandt, split, rim"
- Lighting direction — "Light direction? [suggestion]. Options: front, left, right, overhead, backlit, diffuse"
- Camera angle — "Camera angle? [suggestion]. Options: eye-level, low-angle, high-angle, bird's-eye, worm's-eye, dutch, overhead"
- Camera lens — "Lens? [suggestion]. Options: 24mm (wide), 35mm, 50mm (standard), 85mm (portrait), 135mm (telephoto), macro"
- Depth of field — "Depth of field? [suggestion]. Options: shallow (blurred background), moderate, deep (everything sharp)"
- Color palette — "Any specific colors? You can use names or hex codes (e.g., '#FF6B6B, deep navy'). Or say 'skip'."
- Composition — "Composition rule? [suggestion]. Options: rule-of-thirds, centered, symmetrical, golden-ratio, leading-lines, frame-within-frame"
- Framing — "Framing? [suggestion]. Options: extreme-close-up, close-up, medium-shot, full-body, wide-shot, establishing-shot"
At any point if the user says "that's enough" or "skip the rest," stop asking and fill remaining fields with context-appropriate defaults.
After Tier 2, ask:
"Want full control over details, textures, and effects? Or is this good to go?"
Tier 3 — If user wants full control
Ask about each Tier 3 field one at a time:
- Clothing — only if subject is a character
- Weather — only if scene is outdoors
- Materials — relevant material descriptions
- Accessories — props and accessories
- Time of day — dawn, midday, golden hour, twilight, midnight
- Era — historical or futuristic period
- Textures — surface qualities (matte, glossy, rough, brushstrokes, etc.)
- Effects — post-processing (film grain, lens flare, bokeh, motion blur, vignette)
- Negative prompt — "Anything to explicitly EXCLUDE? Common: watermarks, text, extra limbs, blurry"
- Style references — "Any artistic references? e.g., 'in the style of Wes Anderson', 'Ghibli-inspired'"
Skip fields that don't apply to the current subject/scene. Don't ask about clothing for a landscape.
Phase 3: Preset Shortcut
If the user says "use [name]" or "start from [name]" at any point:
- Look for a matching
.jsonfile in thepresets/directory - If found: load it, display its fields, and ask "What would you like to change?"
- If not found: respond "No preset found matching '[name]'. Did you mean one of these?" and list close matches by name (substring/prefix). If no close matches, list all categories.
Listing presets — if user says "show presets" or "list presets":
Group by _meta.category and display:
General: cinematic, photorealistic, anime, oil-painting, product-shot, editorial-fashion, fantasy-art Web Design: web-hero-section, web-product-mockup, web-ui-illustration, web-avatar, web-blog-thumbnail, web-icon-asset, web-saas-dashboard-bg Game Design: game-character-concept, game-environment-concept, game-item-icon, game-card-art, game-pixel-sprite, game-ui-mockup, game-splash-screen, game-texture-tile App Design: app-onboarding-illustration, app-empty-state, app-store-screenshot, app-icon, app-notification-graphic, app-walkthrough-hero, app-feature-banner, app-avatar-pack, app-dark-mode-bg Cross-over: logo-concept, mood-board Your Presets: [list any user-saved presets with _meta.category = "user"]
Preset load precedence: If a user-saved preset shares a name with a curated preset, the user preset wins.
Phase 4: Context Inference
Before generating, infer API parameters by scanning the assembled JSON field VALUES using the rules in assets/prompt_config.yaml.
Aspect ratio inference:
- landscape, panorama, establishing-shot, wide-shot, hero-section, splash-screen, banner, dashboard → 16:9
- portrait, full-body, app-store, phone, onboarding, mobile, story → 9:16
- card-art, walkthrough → 3:4
- environment-concept, ultrawide, cinematic-wide → 21:9
- default → 1:1
Thinking level inference:
- multiple characters, complex scene, detailed environment, multi-reference, intricate → high
- simple, icon, sprite, single object, flat style, minimal, logo, thumbnail → minimal
- default → high
Resolution inference:
- icon, thumbnail, sprite, notification, small, avatar → 1K
- product-mockup, editorial, portrait, card-art, app-store, character-concept → 2K
- 4K, print, poster, splash, hero-section, wallpaper, large → 4K
- default → 1K
Show the inferred parameters for override:
Ready to generate:
- Resolution: [inferred] | Aspect Ratio: [inferred] | Thinking: [inferred] | Images: 1
Override anything? Or say 'go' to generate.
Phase 5: Prompt Assembly
Flatten the structured JSON into a single narrative prompt string optimized for Nano Banana 2.
Assembly template:
[Style]. [Subject] [action] in [environment]. [Composition framing], [composition rule].
[Camera lens], [aperture], [angle], [depth of field]. [Lighting type] lighting from [direction], [color temperature].
[Mood] mood. Colors: [color palette with hex codes]. [Details]. [Textures]. [Effects]. [Style references].
[Negative prompt in ALL CAPS: "Do NOT include..."]
Assembly rules:
- Omit any segment where ALL its fields are null. Do not leave orphaned labels or punctuation.
- Use hex color codes where the user provided them (e.g.,
#FF6B6Binstead of "coral red") - Write negative prompts in ALL CAPS (e.g., "Do NOT include watermarks, text, or blurry areas") — this improves adherence
- For photorealistic styles, include a camera model name (e.g., "Canon EOS R5", "Sony A7 IV") to push toward photorealism
- Include composition buzzwords like "Pulitzer-prize-winning" or "National Geographic cover" for professional quality when appropriate
- Each JSON field maps to a distinct prompt segment to prevent concept bleeding
Call the MCP tool:
mcp__nano-banana-2__generate_image(
prompt: [assembled prompt string],
resolution: [from api params],
aspectRatio: [from api params],
thinking: [from api params],
numberOfImages: [from api params]
)
Phase 6: Refinement Loop
After generation, store the complete JSON as a numbered version and present options:
Version 1 generated. What next?
- Describe changes in plain English (I'll update the JSON and re-generate)
- "save as preset [name]" to save this config for future use
- "show json" to see the current prompt JSON
- "done" to finish
Version Tracking
Maintain a numbered list of complete JSON snapshots in context:
Version History:
├─ v1: [brief description] → generated
├─ v2: [changes from v1] → generated
└─ v3: branched from v1, [changes] → generated
Operations:
- Tweak (natural language): "make the lighting warmer" → update relevant JSON fields
- If the change is localized (color, lighting adjustment, add/remove detail): use
mcp__nano-banana-2__continue_editingwith the edit description - If the change is fundamental (different subject, style, environment, composition): use
mcp__nano-banana-2__generate_imagewith the full updated prompt
- If the change is localized (color, lighting adjustment, add/remove detail): use
- Branch: "go back to v2 but change the mood to mysterious" → load v2's complete JSON snapshot, apply changes, generate as new version via
generate_image(MCP image context is lost when branching) - Compare: "compare v1 and v3" → show field-level differences between two version snapshots
- Show JSON: display the current version's complete JSON
Version Limits
Cap at 20 versions per session. At v20:
"You've hit 20 versions this session. Save your favorites as presets, then say 'clear history' to reset and keep going."
On "clear history": discard all in-memory JSON snapshots (branching to prior versions is no longer possible) and reset the counter to v1.
Preset Management
Saving Presets
When the user says "save as preset [name]":
- Take the current version's JSON
- Add
_metablock with"category": "user"and the user's chosen name - Write to
presets/[name].jsonin the skill directory - If a curated preset with the same name exists, warn: "A built-in preset with this name exists — your version will override it on load. Proceed?"
- Confirm: "Saved preset '[name]'. Load it anytime with 'use [name]'."
Deleting Presets
When the user says "delete preset [name]":
- If it's a user preset (
_meta.category: "user"): delete the file - If it's a curated preset: respond "That's a built-in preset and can't be deleted. You can override it by saving a user preset with the same name."
Key Techniques (from research)
These are baked into the prompt assembly logic:
- Structured JSON prevents concept bleeding — each field maps to a distinct prompt segment, keeping colors, lighting, and subjects isolated
- Hex color codes for precise color control beyond natural language
- ALL CAPS negative directives significantly improve model adherence
- Camera model names (Canon EOS R5, Sony A7 IV) push toward photorealism
- Composition buzzwords ("Pulitzer-prize-winning", "National Geographic cover") improve professional quality
- Nano Banana 2's 32K+ token context enables rich, detailed structured prompts
- Narrative structure with embedded technical specs is the optimal prompt format for Nano Banana models