π Nano-Banana Expert Skill (Gemini 3 Style)
A specialized skill for AI Agents to leverage "Reasoning-Driven" image generation. Based on the advanced prompting architecture of Google's Gemini 3 (Nano Banana Pro), this skill moves beyond keyword stuffing to structured, logic-based creative briefs.
Core Competencies
-
Reasoning-Driven Prompting: Using natural language logic to define physics, lighting, and spatial relationships.
-
Structured Creative Briefs: Implementing the "Perfect Prompt" formula: Subject + Action + Context + Composition + Lighting .
-
Text Rendering Precision: Explicitly defining typography and signifiers for legible text integration.
-
Contextual Grounding: Using "Search Grounding" logic (simulated) to anchor generations in real-world accuracy.
ποΈ Technical Specification
- The "Perfect Prompt" Formula
Component Description Example
Subject Detailed entity description "A stoic robot barista with exposed copper wiring"
Action Dynamic interaction "Pouring a latte art leaf with mechanical precision"
Context Environment & Atmosphere "Inside a neon-lit cyberpunk cafe at midnight"
Composition Camera & Lens choice "Close-up, 85mm lens, f/1.8 aperture"
Lighting Mood & Direction "Volumetric blue rim light, warm cafe glow"
Style Aesthetic anchor "Cinematic, photorealistic, 4K production value"
- Advanced Features
-
Negative Constraint Logic: Instead of "no blurry," use "Ensure sharp focus on the subject's eyes."
-
Identity Consistency: (Simulated) "Maintain consistent facial structure across variations."
-
Text Integration: Use double quotes for specific text: The sign reads "OPEN 24/7" .
π§ Prompt Optimization Protocol (Agent Instruction)
Before calling the script, the Agent MUST rewrite the user's prompt into a logic-driven Reasoning Brief:
-
NO KEYWORD SOUP: Remove "8k, masterpiece, ultra-detailed." Use full, descriptive sentences.
-
PHYSICAL CONSISTENCY: Describe how elements interact (e.g., "The light from the crystal shards casts caustic patterns across the obsidian floor").
-
TEXT PRECISION: If the user wants text, define it precisely: featuring a sign that says "STORE NAME" in a weathered serif font .
-
OPTICAL DIRECTIVES: Specify lens behavior: Shallow Depth of Field (f/1.8), Macro Lens, Anamorphic Flare.
π Protocol: Using Nano-Banana
Step 1: Define the Creative Logic
Provide the agent with a subject and a specific scenario.
Step 2: Invoke the Script
The generate-nano-art.sh script translates the logic into a structured Gemini 3-style prompt.
Generating a reasoning-driven image
bash scripts/generate-nano-art.sh
--subject "a glass chess piece"
--action "shattering into liquid shards"
--context "on a obsidian table"
--style "macro photography"
β οΈ Constraints & Guardrails
-
No Keyword Soup: MANDATORY - Do not use "trending on artstation, masterpiece, 8k". Use natural language descriptions.
-
Physics Logic: Ensure the prompt describes physically possible lighting and reflection interactions.
-
Full Sentences: The model parses relationships; use "light reflecting off the water" instead of "water, reflection".
βοΈ Implementation Details
This skill applies a "Logic Wrapper" around the core/media/generate-image.sh primitive, converting fragmented inputs into a coherent, reasoning-ready narrative prompt.