EmojiGen Nano Banana

Use this skill to reproduce the EmojiGen Pro workflow as a reusable agent workflow instead of a browser app.

Read this skill end to end before you start work. Do not jump straight to writing a config, building a prompt, or calling a model until you have read the SOP and decided how you will satisfy every step.

What to collect before doing work

Do not start generation until you have either explicit answers or safe defaults for:

Reference image path.
Output mode: animated or static.
Emotion list, or a category prompt that can be expanded into emotions.
Style target, such as 皮克斯 3D, 吉卜力, Q版 LINE.
Optional custom text and color.
Output directory.
Backend choice:
- Gemini Developer API via GEMINI_API_KEY, GOOGLE_API_KEY, or API_KEY
- Vertex AI via GOOGLE_GENAI_USE_VERTEXAI=true plus GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION
- Another image tool chosen by the agent when Gemini access is unavailable

Before generation, inspect the current reference image and rewrite characterNotes for this exact subject. Never reuse stale characterNotes, propNotes, or style notes from a previous run on a different person.

If the user is adapting the original EmojiGen Pro repository, first reconstruct the workflow from the codebase before you rewrite anything. Preserve the original sequence:

Collect or generate emotion labels.
Assemble one long prompt for a strict 4x6 sticker sheet.
Generate the sheet image from the reference image.
Slice the sheet into frames or stickers.
Encode GIFs for animated mode.

Default decisions

Only use these image models:
- Nano Banana Pro -> gemini-3-pro-image-preview
- Nano Banana 2 -> gemini-3.1-flash-image-preview
Default to Nano Banana Pro unless the user explicitly asks for Nano Banana 2.
Default style: 皮克斯 3D
Default removeBackground: false
Random emotions should be generated by the agent locally by default. Do not depend on a Gemini text model unless the user explicitly wants model-generated wording.
Keep count constraints hard:
- static mode always resolves to exactly 24 stickers
- animated mode only allows 1, 2, or 4 GIFs
Force image generation settings to:
- aspect ratio 3:2
- image size 2K
Keep the output contract stable even if image generation uses a fallback tool:
- prompt.txt
- resolved-config.json
- grid.*
- extracted stickers/
- manifest.json

Working sequence

0. Stage the source image when the path is unstable

If the image came from the clipboard, a pasted chat image, or any source whose original path is unreliable, save it into /tmp first:

node skills/emojigen-nano-banana/scripts/emojigen.mjs stage-image \
  --from-clipboard

Or copy a known file into /tmp so later steps use a stable path:

node skills/emojigen-nano-banana/scripts/emojigen.mjs stage-image \
  --input /abs/path/to/source.png

Use the staged path for all later steps.

1. Prepare config

Start from assets/example-config.json. Fill only the fields needed for the current task.

If the user did not give an emotion list, leave emotions empty and provide categoryPrompt.

Then:

infer a category prompt from the request and let the agent produce the random emotions directly, or
only if the user explicitly wants model-generated wording, run:

node skills/emojigen-nano-banana/scripts/emojigen.mjs suggest-emotions \
  --category "职场打工人, 加班, 摸鱼, 收到, 崩溃, 阴阳怪气" \
  --count 4

2. Run preflight before generation

Preflight checks the backend, confirms the staged reference path, and resolves missing random emotions without starting image generation:

node skills/emojigen-nano-banana/scripts/emojigen.mjs preflight \
  --config path/to/config.json \
  --reference /tmp/emojigen-input-123.png

3. Build the prompt

Always build the prompt through the script so the wording stays consistent:

node skills/emojigen-nano-banana/scripts/emojigen.mjs build-prompt \
  --config path/to/config.json \
  --out path/to/output/prompt.txt

Do not stop here. build-prompt is not the delivery workflow.

4. Generate the 4x6 grid

If Gemini or Vertex AI is available, prefer the built-in generator:

node skills/emojigen-nano-banana/scripts/emojigen.mjs generate-grid \
  --config path/to/config.json \
  --reference path/to/reference.png \
  --out path/to/output/grid.png

The script rejects image models outside Nano Banana Pro and Nano Banana 2, and always sends 3:2 + 2K.

Do not take prompt.txt and call a raw image model yourself when the built-in workflow is available. That bypasses the skill's staging, preflight, slicing, background-removal, and quality gates.

If another image tool is a better fit, still use this skill. Build the prompt with this skill, generate the grid elsewhere, then continue with make-assets.

5. Produce GIFs or static stickers

If you already have a grid image, run:

node skills/emojigen-nano-banana/scripts/emojigen.mjs make-assets \
  --config path/to/config.json \
  --grid path/to/output/grid.png \
  --out-dir path/to/output

This creates square crops, optional background removal, and GIF outputs for animated mode.

Keep removeBackground: false by default. Only enable background removal when the user explicitly wants transparent stickers and the generated sheet clearly uses a flat, separable background.

Read manifest.json after make-assets or run. If manifest.quality.status is warn, do not deliver the result yet. Rerun with stricter characterNotes, stronger square-safe composition constraints, or removeBackground: false.

Background removal uses a corner-connected flood-fill strategy. This is safer than making every near-background color transparent, and avoids punching holes in faces or clothing when skin tones are similar to the background.

Treat square-safe composition as a hard requirement, not a style preference. The final assets are cropped to square cells, so the subject must stay centered and stable across frames or the GIF will jitter after slicing.

6. Full end-to-end run

When no step needs manual intervention, use the orchestration command:

node skills/emojigen-nano-banana/scripts/emojigen.mjs run \
  --config path/to/config.json \
  --reference path/to/reference.png \
  --out-dir /tmp/emojigen-run \
  --deliver-dir path/to/workspace-output \
  --cleanup-temp

Use --deliver-dir to copy the finished assets into the working directory or a client delivery folder.

Use --cleanup-temp after delivery when the outputs were generated under /tmp/emojigen-*. macOS may eventually clear /tmp, but not immediately enough for agent workflows.

Treat this as the preferred path. The default expectation is:

stage-image
preflight
run
inspect manifest.quality
deliver only if quality is acceptable

Do not skip any of these steps unless the user explicitly narrows the task and you can still preserve output quality.

Fallback rules

If no Gemini credentials are present, say that explicitly and either ask for credentials or use another image-capable tool.
If another tool generated the grid, say that the final GIF packaging still came from this skill.
If background removal damages line art or text, rerun with removeBackground: false and keep the pure solid background from prompt-time constraints.
Do not proactively enable background removal just because the script supports it.
If the input image arrived as a pasted or clipboard image, stage it to /tmp before any prompt or generation step.
If the user only asked for random emotions, do not call a text model by default. Generate them directly unless the user explicitly wants a model to brainstorm them.

References

Read references/workflow.md for CLI usage, environment variable precedence, and output layout.
Read references/model-backends.md when choosing between Gemini API and Vertex AI.