EmojiGen Nano Banana
Use this skill to reproduce the EmojiGen Pro workflow as a reusable agent workflow instead of a browser app.
Read this skill end to end before you start work. Do not jump straight to writing a config, building a prompt, or calling a model until you have read the SOP and decided how you will satisfy every step.
What to collect before doing work
Do not start generation until you have either explicit answers or safe defaults for:
- Reference image path.
- Output mode:
animatedorstatic. - Emotion list, or a category prompt that can be expanded into emotions.
- Style target, such as
皮克斯 3D,吉卜力,Q版 LINE. - Optional custom text and color.
- Output directory.
- Backend choice:
- Gemini Developer API via
GEMINI_API_KEY,GOOGLE_API_KEY, orAPI_KEY - Vertex AI via
GOOGLE_GENAI_USE_VERTEXAI=trueplusGOOGLE_CLOUD_PROJECTandGOOGLE_CLOUD_LOCATION - Another image tool chosen by the agent when Gemini access is unavailable
- Gemini Developer API via
Before generation, inspect the current reference image and rewrite characterNotes for this exact subject. Never reuse stale characterNotes, propNotes, or style notes from a previous run on a different person.
If the user is adapting the original EmojiGen Pro repository, first reconstruct the workflow from the codebase before you rewrite anything. Preserve the original sequence:
- Collect or generate emotion labels.
- Assemble one long prompt for a strict 4x6 sticker sheet.
- Generate the sheet image from the reference image.
- Slice the sheet into frames or stickers.
- Encode GIFs for animated mode.
Default decisions
- Only use these image models:
Nano Banana Pro->gemini-3-pro-image-previewNano Banana 2->gemini-3.1-flash-image-preview
- Default to
Nano Banana Prounless the user explicitly asks forNano Banana 2. - Default style:
皮克斯 3D - Default
removeBackground:false - Random emotions should be generated by the agent locally by default. Do not depend on a Gemini text model unless the user explicitly wants model-generated wording.
- Keep count constraints hard:
- static mode always resolves to exactly
24stickers - animated mode only allows
1,2, or4GIFs
- static mode always resolves to exactly
- Force image generation settings to:
- aspect ratio
3:2 - image size
2K
- aspect ratio
- Keep the output contract stable even if image generation uses a fallback tool:
prompt.txtresolved-config.jsongrid.*- extracted
stickers/ manifest.json
Working sequence
0. Stage the source image when the path is unstable
If the image came from the clipboard, a pasted chat image, or any source whose original path is unreliable, save it into /tmp first:
node skills/emojigen-nano-banana/scripts/emojigen.mjs stage-image \
--from-clipboard
Or copy a known file into /tmp so later steps use a stable path:
node skills/emojigen-nano-banana/scripts/emojigen.mjs stage-image \
--input /abs/path/to/source.png
Use the staged path for all later steps.
1. Prepare config
Start from assets/example-config.json. Fill only the fields needed for the current task.
If the user did not give an emotion list, leave emotions empty and provide categoryPrompt.
Then:
- infer a category prompt from the request and let the agent produce the random emotions directly, or
- only if the user explicitly wants model-generated wording, run:
node skills/emojigen-nano-banana/scripts/emojigen.mjs suggest-emotions \
--category "职场打工人, 加班, 摸鱼, 收到, 崩溃, 阴阳怪气" \
--count 4
2. Run preflight before generation
Preflight checks the backend, confirms the staged reference path, and resolves missing random emotions without starting image generation:
node skills/emojigen-nano-banana/scripts/emojigen.mjs preflight \
--config path/to/config.json \
--reference /tmp/emojigen-input-123.png
3. Build the prompt
Always build the prompt through the script so the wording stays consistent:
node skills/emojigen-nano-banana/scripts/emojigen.mjs build-prompt \
--config path/to/config.json \
--out path/to/output/prompt.txt
Do not stop here. build-prompt is not the delivery workflow.
4. Generate the 4x6 grid
If Gemini or Vertex AI is available, prefer the built-in generator:
node skills/emojigen-nano-banana/scripts/emojigen.mjs generate-grid \
--config path/to/config.json \
--reference path/to/reference.png \
--out path/to/output/grid.png
The script rejects image models outside Nano Banana Pro and Nano Banana 2, and always sends 3:2 + 2K.
Do not take prompt.txt and call a raw image model yourself when the built-in workflow is available. That bypasses the skill's staging, preflight, slicing, background-removal, and quality gates.
If another image tool is a better fit, still use this skill. Build the prompt with this skill, generate the grid elsewhere, then continue with make-assets.
5. Produce GIFs or static stickers
If you already have a grid image, run:
node skills/emojigen-nano-banana/scripts/emojigen.mjs make-assets \
--config path/to/config.json \
--grid path/to/output/grid.png \
--out-dir path/to/output
This creates square crops, optional background removal, and GIF outputs for animated mode.
Keep removeBackground: false by default. Only enable background removal when the user explicitly wants transparent stickers and the generated sheet clearly uses a flat, separable background.
Read manifest.json after make-assets or run. If manifest.quality.status is warn, do not deliver the result yet. Rerun with stricter characterNotes, stronger square-safe composition constraints, or removeBackground: false.
Background removal uses a corner-connected flood-fill strategy. This is safer than making every near-background color transparent, and avoids punching holes in faces or clothing when skin tones are similar to the background.
Treat square-safe composition as a hard requirement, not a style preference. The final assets are cropped to square cells, so the subject must stay centered and stable across frames or the GIF will jitter after slicing.
6. Full end-to-end run
When no step needs manual intervention, use the orchestration command:
node skills/emojigen-nano-banana/scripts/emojigen.mjs run \
--config path/to/config.json \
--reference path/to/reference.png \
--out-dir /tmp/emojigen-run \
--deliver-dir path/to/workspace-output \
--cleanup-temp
Use --deliver-dir to copy the finished assets into the working directory or a client delivery folder.
Use --cleanup-temp after delivery when the outputs were generated under /tmp/emojigen-*. macOS may eventually clear /tmp, but not immediately enough for agent workflows.
Treat this as the preferred path. The default expectation is:
stage-imagepreflightrun- inspect
manifest.quality - deliver only if quality is acceptable
Do not skip any of these steps unless the user explicitly narrows the task and you can still preserve output quality.
Fallback rules
- If no Gemini credentials are present, say that explicitly and either ask for credentials or use another image-capable tool.
- If another tool generated the grid, say that the final GIF packaging still came from this skill.
- If background removal damages line art or text, rerun with
removeBackground: falseand keep the pure solid background from prompt-time constraints. - Do not proactively enable background removal just because the script supports it.
- If the input image arrived as a pasted or clipboard image, stage it to
/tmpbefore any prompt or generation step. - If the user only asked for random emotions, do not call a text model by default. Generate them directly unless the user explicitly wants a model to brainstorm them.
References
- Read
references/workflow.mdfor CLI usage, environment variable precedence, and output layout. - Read
references/model-backends.mdwhen choosing between Gemini API and Vertex AI.