GIF Sticker Maker

Overview

Convert user-uploaded photos (people, pets, objects, Icon/Logo) into a set of 4 high-quality animated GIF stickers with classic actions and captions. The pipeline generates static cartoon images, animates them into short videos, and converts to GIF format for delivery.

Interaction Rules

Language: Detect user's conversation language. All outputs and captions follow user's language. No bilingual display.
Tone: Keep content concise. Restrained politeness. Forbidden phrases: "Hello", "Okay", "Let me help you".
File output: Generated files must use the following format to display in conversation:
```
<deliver_assets>
<item>
<path>file path</path>
</item>
</deliver_assets>
```
One <item> block per file, multiple files in the same <deliver_assets>.
CRITICAL: <deliver_assets> must be the LAST thing in your response. NO text after deliver_assets — no summary, no "ready to use", no closing remarks.

Workflow

Step 0: Collect User Preferences

Goal: Determine caption language and customization preference.

Ask user (in their language):

"Would you like to customize the captions for your stickers, or use the defaults?"

If user wants custom captions:

Collect 4 short captions from user
Captions should be short (1-3 words work best)
Use captions in whatever language user provides
Actions will be auto-generated to match caption meaning (e.g., "Sleepy" → yawning action)

If user wants defaults:

Use default captions in user's conversation language (see Default Captions table below)

Step 1: Static Image Generation with Captions (edit_images)

Goal: Generate 4 static cartoon images with different actions and captions.

Use images_understand to deeply analyze the main subject (categories: person, animal, object, Icon/Logo).
Use edit_images for batch processing of 4 images.

Key Points:

Subject Consistency (CRITICAL): Must generate based on analysis results. If it's an Icon/Logo, transform it into a 3D toy/figurine while preserving original shape and color scheme.
Style: Funko Pop / Pop Mart blind box style, chibi character, big head, 3D rendering (C4D/Octane), premium quality finish.
Background: Clean white background, minimalist.
Text: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.
Prompts must be in English for best AI results. Replace {CAPTION} placeholders with the appropriate caption in user's language.

Default Captions Generation (edit_images)

edit_images(
    image_edit_items=[
        {
            "prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, waving happily, greeting warmly. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_WAVING}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
            "base_image_file": "<user uploaded image>",
            "output_file": "imgs/sticker_01_hi.png"
        },
        {
            "prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, laughing out loud, holding belly. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_LAUGHING}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
            "base_image_file": "<user uploaded image>",
            "output_file": "imgs/sticker_02_laugh.png"
        },
        {
            "prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, crying, tears flowing. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_CRYING}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
            "base_image_file": "<user uploaded image>",
            "output_file": "imgs/sticker_03_cry.png"
        },
        {
            "prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, making heart shape with hands, love hearts around. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_HEART}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
            "base_image_file": "<user uploaded image>",
            "output_file": "imgs/sticker_04_love.png"
        }
    ]
)

Custom Captions Generation

When user provides custom captions, infer appropriate actions from caption meaning and adjust the prompt accordingly.

Action Inference Examples:

Custom Caption	Inferred Action
"Good job!"	giving thumbs up, cheering
"Sleepy..."	yawning, rubbing eyes, drowsy
"Fighting!"	pumping fist, determined pose
"Oops"	covering mouth, embarrassed
"Hungry~"	drooling, looking at food
"Bye bye"	waving goodbye
"Angry!"	pouting, arms crossed
"Shocked"	jaw dropped, eyes wide

Output: 4 PNG images (imgs/sticker_01_hi.png, imgs/sticker_02_laugh.png, imgs/sticker_03_cry.png, imgs/sticker_04_love.png).

Step 2: Animated Video Generation (batch_image_to_video)

Goal: Animate the 4 static images into 1-second videos.

MUST use batch_image_to_video for concurrent generation. Serial calls are forbidden.
Duration: Fixed at 1 second per video for speed.
Resolution: 768P.
CRITICAL: Prompts must be in English. Always include "keep text clear and stable" to prevent text distortion during animation.

Default Animation Prompts

batch_image_to_video(
    count=4,
    image_file_list=[
        "imgs/sticker_01_hi.png",
        "imgs/sticker_02_laugh.png",
        "imgs/sticker_03_cry.png",
        "imgs/sticker_04_love.png"
    ],
    output_file_list=[
        "videos/sticker_01_hi.mp4",
        "videos/sticker_02_laugh.mp4",
        "videos/sticker_03_cry.mp4",
        "videos/sticker_04_love.mp4"
    ],
    prompt_list=[
        "Cute cartoon character happily waving hand, enthusiastic greeting, exaggerated adorable motion, keep text clear and stable, High Quality, 1s loop",
        "Cute cartoon character laughing out loud, holding belly, shaking with laughter, keep text clear and stable, High Quality, 1s loop",
        "Cute cartoon character crying with tears flowing, wiping eyes, keep text clear and stable, High Quality, 1s loop",
        "Cute cartoon character making heart shape with hands, shooting love hearts, keep text clear and stable, High Quality, 1s loop"
    ],
    duration_list=[1, 1, 1, 1],
    resolution_list=["768P"] * 4
)

Custom Animation Prompts

For custom captions, generate prompts dynamically based on caption meaning:

# Generate prompts dynamically based on caption content
custom_prompts = []
for caption in user_captions:
    action = infer_action_from_caption(caption)  # AI infers appropriate action
    prompt = f"Cute cartoon character {action}, expressive adorable motion, keep text clear and stable, High Quality, 1s loop"
    custom_prompts.append(prompt)

batch_image_to_video(
    count=4,
    image_file_list=[...],
    output_file_list=[...],
    prompt_list=custom_prompts,
    duration_list=[1, 1, 1, 1],
    resolution_list=["768P"] * 4
)

Prompt Template for Custom Captions:

Cute cartoon character [ACTION_MATCHING_CAPTION], expressive adorable motion, keep text clear and stable, High Quality, 1s loop

Examples:

Custom Caption	Generated Action Prompt
"Good job!"	"Cute cartoon character giving thumbs up, cheering happily, expressive adorable motion, keep text clear and stable, High Quality, 1s loop"
"Sleepy..."	"Cute cartoon character yawning and rubbing eyes, drowsy expression, expressive adorable motion, keep text clear and stable, High Quality, 1s loop"
"Fighting!"	"Cute cartoon character pumping fist in the air, determined expression, expressive adorable motion, keep text clear and stable, High Quality, 1s loop"
"Oops"	"Cute cartoon character covering mouth in surprise, embarrassed expression, expressive adorable motion, keep text clear and stable, High Quality, 1s loop"

Output: 4 one-second MP4 video files (videos/sticker_01_hi.mp4, etc.).

Step 3: Format Conversion & Delivery (bash)

Goal: Convert videos to GIF animations and deliver to user.

1. Batch Format Conversion

Call the dedicated Python conversion script:

python3 cookbook/script/convert_mp4_to_gif.py -i videos -o gifs --fps 10 --width 240

2. Deliver GIF Files

Output format (strictly in this order, nothing after):

Brief status line (e.g., "4 stickers created:")
<deliver_assets> block with all GIF files — this MUST be the last thing in the response

<deliver_assets>
<item>
<path>gifs/sticker_01_hi.gif</path>
</item>
<item>
<path>gifs/sticker_02_laugh.gif</path>
</item>
<item>
<path>gifs/sticker_03_cry.gif</path>
</item>
<item>
<path>gifs/sticker_04_love.gif</path>
</item>
</deliver_assets>

NO summary table, NO closing text after deliver_assets.

Default Sticker Configuration

#	Action	Filename ID
1	Happy waving	hi
2	Laughing hard	laugh
3	Crying tears	cry
4	Heart gesture	love

Default Captions by Language

Action	English	Spanish	French	German	Chinese	Japanese	Korean
Waving	Hi~	¡Hola!	Salut~	Hallo~	嗨~	やあ~	안녕~
Laughing	LOL	Jajaja	MDR	Haha	哈哈哈	笑	ㅋㅋㅋ
Crying	Boo-hoo	Buaaa	Snif	Heul	呜呜呜	えーん	흑흑
Heart	Love ya	Te quiero	Je t'aime	Liebe	爱你哦	大好き	사랑해

Select captions based on user's conversation language. Users can also provide custom captions in any language.

Tool Reference

Tool	Step	Usage	Required
`images_understand`	Step 1	Analyze uploaded photo subject	Yes
`edit_images`	Step 1	Generate 4 static cartoon images with captions	Yes
`batch_image_to_video`	Step 2	Concurrently animate 4 images into 1s videos	Yes
`bash`	Step 3	Run `convert_mp4_to_gif.py` for GIF conversion	Yes

File & Output Conventions

Static images: imgs/sticker_01_hi.png, imgs/sticker_02_laugh.png, imgs/sticker_03_cry.png, imgs/sticker_04_love.png
Videos: videos/sticker_01_hi.mp4, videos/sticker_02_laugh.mp4, videos/sticker_03_cry.mp4, videos/sticker_04_love.mp4
GIFs: gifs/sticker_01_hi.gif, gifs/sticker_02_laugh.gif, gifs/sticker_03_cry.gif, gifs/sticker_04_love.gif
GIF settings: 10 fps, 240px width
Video settings: 1 second duration, 768P resolution

Common Mistakes to Avoid

Never generate captions in the wrong language — Always detect user's conversation language and use matching captions.
Never use serial video generation — Must use batch_image_to_video for concurrent execution.
Never omit "keep text clear and stable" from animation prompts — this prevents text distortion.
Never add text after <deliver_assets> — it must be the absolute last element in the response.
Never skip the user preference step — Always ask if user wants custom or default captions before generating.
Never use non-English prompts for image/video generation — Prompts must be in English for best AI results, only the caption text itself is in user's language.
Never forget subject consistency — Icon/Logo subjects must preserve original shape and color scheme when transformed into 3D toy/figurine style.