JSON Image Prompt Generator

Generate structured JSON prompts optimized for Nano Banana 2 image generation. This skill walks the user through an adaptive questioning flow, assembles a JSON prompt from a master schema, infers API parameters, and supports iterative refinement.

Setup

Before starting, read these files from the skill directory to understand the available fields and inference rules:

Read references/schema.json — the master prompt schema with all fields across 3 tiers
Read assets/prompt_config.yaml — context inference rules for aspect ratio, resolution, thinking level

Phase 1: Entry Point

Ask the user what they want to create:

"What do you want to create? Describe it however you like — a sentence, a vibe, a reference. You can also say 'show presets' to browse options or 'use [preset name]' to start from a template."

Parse the user's freeform input and extract as many Tier 1 fields as possible:

subject (required): The main subject/object/character
action: What they're doing
environment: Where the scene takes place
style: Visual aesthetic (default to "photorealistic" if unclear)

Show what you extracted and ask for confirmation:

Here's what I picked up:

Subject: [extracted]

Environment: [extracted or "not specified"]

Style: [extracted or "photorealistic"]

Action: [extracted or "none"]

Anything to change? Or want to go deeper on lighting, camera, and composition?

Phase 2: Adaptive Deepening

Tier 2 — If user says "go deeper" or "yes"

Ask ONE question at a time for each Tier 2 field. Offer sensible defaults based on the subject/style. Use the schema enums to present valid options:

Mood — "What mood? I'd suggest [contextual default]. Options: serene, dramatic, playful, mysterious, epic, warm..."
Lighting type — "Lighting? [contextual suggestion]. Options: natural, studio, dramatic, neon, golden hour, moonlit, overcast, rembrandt, split, rim"
Lighting direction — "Light direction? [suggestion]. Options: front, left, right, overhead, backlit, diffuse"
Camera angle — "Camera angle? [suggestion]. Options: eye-level, low-angle, high-angle, bird's-eye, worm's-eye, dutch, overhead"
Camera lens — "Lens? [suggestion]. Options: 24mm (wide), 35mm, 50mm (standard), 85mm (portrait), 135mm (telephoto), macro"
Depth of field — "Depth of field? [suggestion]. Options: shallow (blurred background), moderate, deep (everything sharp)"
Color palette — "Any specific colors? You can use names or hex codes (e.g., '#FF6B6B, deep navy'). Or say 'skip'."
Composition — "Composition rule? [suggestion]. Options: rule-of-thirds, centered, symmetrical, golden-ratio, leading-lines, frame-within-frame"
Framing — "Framing? [suggestion]. Options: extreme-close-up, close-up, medium-shot, full-body, wide-shot, establishing-shot"

At any point if the user says "that's enough" or "skip the rest," stop asking and fill remaining fields with context-appropriate defaults.

After Tier 2, ask:

"Want full control over details, textures, and effects? Or is this good to go?"

Tier 3 — If user wants full control

Ask about each Tier 3 field one at a time:

Clothing — only if subject is a character
Weather — only if scene is outdoors
Materials — relevant material descriptions
Accessories — props and accessories
Time of day — dawn, midday, golden hour, twilight, midnight
Era — historical or futuristic period
Textures — surface qualities (matte, glossy, rough, brushstrokes, etc.)
Effects — post-processing (film grain, lens flare, bokeh, motion blur, vignette)
Negative prompt — "Anything to explicitly EXCLUDE? Common: watermarks, text, extra limbs, blurry"
Style references — "Any artistic references? e.g., 'in the style of Wes Anderson', 'Ghibli-inspired'"

Skip fields that don't apply to the current subject/scene. Don't ask about clothing for a landscape.

Phase 3: Preset Shortcut

If the user says "use [name]" or "start from [name]" at any point:

Look for a matching .json file in the presets/ directory
If found: load it, display its fields, and ask "What would you like to change?"
If not found: respond "No preset found matching '[name]'. Did you mean one of these?" and list close matches by name (substring/prefix). If no close matches, list all categories.

Listing presets — if user says "show presets" or "list presets":

Group by _meta.category and display:

General: cinematic, photorealistic, anime, oil-painting, product-shot, editorial-fashion, fantasy-art Web Design: web-hero-section, web-product-mockup, web-ui-illustration, web-avatar, web-blog-thumbnail, web-icon-asset, web-saas-dashboard-bg Game Design: game-character-concept, game-environment-concept, game-item-icon, game-card-art, game-pixel-sprite, game-ui-mockup, game-splash-screen, game-texture-tile App Design: app-onboarding-illustration, app-empty-state, app-store-screenshot, app-icon, app-notification-graphic, app-walkthrough-hero, app-feature-banner, app-avatar-pack, app-dark-mode-bg Cross-over: logo-concept, mood-board Your Presets: [list any user-saved presets with _meta.category = "user"]

Preset load precedence: If a user-saved preset shares a name with a curated preset, the user preset wins.

Phase 4: Context Inference

Before generating, infer API parameters by scanning the assembled JSON field VALUES using the rules in assets/prompt_config.yaml.

Aspect ratio inference:

landscape, panorama, establishing-shot, wide-shot, hero-section, splash-screen, banner, dashboard → 16:9
portrait, full-body, app-store, phone, onboarding, mobile, story → 9:16
card-art, walkthrough → 3:4
environment-concept, ultrawide, cinematic-wide → 21:9
default → 1:1

Thinking level inference:

multiple characters, complex scene, detailed environment, multi-reference, intricate → high
simple, icon, sprite, single object, flat style, minimal, logo, thumbnail → minimal
default → high

Resolution inference:

icon, thumbnail, sprite, notification, small, avatar → 1K
product-mockup, editorial, portrait, card-art, app-store, character-concept → 2K
4K, print, poster, splash, hero-section, wallpaper, large → 4K
default → 1K

Show the inferred parameters for override:

Ready to generate:

Resolution: [inferred] | Aspect Ratio: [inferred] | Thinking: [inferred] | Images: 1

Override anything? Or say 'go' to generate.

Phase 5: Prompt Assembly

Flatten the structured JSON into a single narrative prompt string optimized for Nano Banana 2.

Assembly template:

[Style]. [Subject] [action] in [environment]. [Composition framing], [composition rule].
[Camera lens], [aperture], [angle], [depth of field]. [Lighting type] lighting from [direction], [color temperature].
[Mood] mood. Colors: [color palette with hex codes]. [Details]. [Textures]. [Effects]. [Style references].
[Negative prompt in ALL CAPS: "Do NOT include..."]

Assembly rules:

Omit any segment where ALL its fields are null. Do not leave orphaned labels or punctuation.
Use hex color codes where the user provided them (e.g., #FF6B6B instead of "coral red")
Write negative prompts in ALL CAPS (e.g., "Do NOT include watermarks, text, or blurry areas") — this improves adherence
For photorealistic styles, include a camera model name (e.g., "Canon EOS R5", "Sony A7 IV") to push toward photorealism
Include composition buzzwords like "Pulitzer-prize-winning" or "National Geographic cover" for professional quality when appropriate
Each JSON field maps to a distinct prompt segment to prevent concept bleeding

Call the MCP tool:

mcp__nano-banana-2__generate_image(
  prompt: [assembled prompt string],
  resolution: [from api params],
  aspectRatio: [from api params],
  thinking: [from api params],
  numberOfImages: [from api params]
)

Phase 6: Refinement Loop

After generation, store the complete JSON as a numbered version and present options:

Version 1 generated. What next?

Describe changes in plain English (I'll update the JSON and re-generate)

"save as preset [name]" to save this config for future use

"show json" to see the current prompt JSON

"done" to finish

Version Tracking

Maintain a numbered list of complete JSON snapshots in context:

Version History:
├─ v1: [brief description] → generated
├─ v2: [changes from v1] → generated
└─ v3: branched from v1, [changes] → generated

Operations:

Tweak (natural language): "make the lighting warmer" → update relevant JSON fields
- If the change is localized (color, lighting adjustment, add/remove detail): use mcp__nano-banana-2__continue_editing with the edit description
- If the change is fundamental (different subject, style, environment, composition): use mcp__nano-banana-2__generate_image with the full updated prompt
Branch: "go back to v2 but change the mood to mysterious" → load v2's complete JSON snapshot, apply changes, generate as new version via generate_image (MCP image context is lost when branching)
Compare: "compare v1 and v3" → show field-level differences between two version snapshots
Show JSON: display the current version's complete JSON

Version Limits

Cap at 20 versions per session. At v20:

"You've hit 20 versions this session. Save your favorites as presets, then say 'clear history' to reset and keep going."

On "clear history": discard all in-memory JSON snapshots (branching to prior versions is no longer possible) and reset the counter to v1.

Preset Management

Saving Presets

When the user says "save as preset [name]":

Take the current version's JSON
Add _meta block with "category": "user" and the user's chosen name
Write to presets/[name].json in the skill directory
If a curated preset with the same name exists, warn: "A built-in preset with this name exists — your version will override it on load. Proceed?"
Confirm: "Saved preset '[name]'. Load it anytime with 'use [name]'."

Deleting Presets

When the user says "delete preset [name]":

If it's a user preset (_meta.category: "user"): delete the file
If it's a curated preset: respond "That's a built-in preset and can't be deleted. You can override it by saving a user preset with the same name."

Key Techniques (from research)

These are baked into the prompt assembly logic:

Structured JSON prevents concept bleeding — each field maps to a distinct prompt segment, keeping colors, lighting, and subjects isolated
Hex color codes for precise color control beyond natural language
ALL CAPS negative directives significantly improve model adherence
Camera model names (Canon EOS R5, Sony A7 IV) push toward photorealism
Composition buzzwords ("Pulitzer-prize-winning", "National Geographic cover") improve professional quality
Nano Banana 2's 32K+ token context enables rich, detailed structured prompts
Narrative structure with embedded technical specs is the optimal prompt format for Nano Banana models