GIF Sticker Maker
Overview
Convert user-uploaded photos (people, pets, objects, Icon/Logo) into a set of 4 high-quality animated GIF stickers with classic actions and captions. The pipeline generates static cartoon images, animates them into short videos, and converts to GIF format for delivery.
Interaction Rules
- Language: Detect user's conversation language. All outputs and captions follow user's language. No bilingual display.
- Tone: Keep content concise. Restrained politeness. Forbidden phrases: "Hello", "Okay", "Let me help you".
- File output: Generated files must use the following format to display in conversation:
One<deliver_assets> <item> <path>file path</path> </item> </deliver_assets><item>block per file, multiple files in the same<deliver_assets>. - CRITICAL:
<deliver_assets>must be the LAST thing in your response. NO text after deliver_assets — no summary, no "ready to use", no closing remarks.
Workflow
Step 0: Collect User Preferences
Goal: Determine caption language and customization preference.
Ask user (in their language):
- "Would you like to customize the captions for your stickers, or use the defaults?"
If user wants custom captions:
- Collect 4 short captions from user
- Captions should be short (1-3 words work best)
- Use captions in whatever language user provides
- Actions will be auto-generated to match caption meaning (e.g., "Sleepy" → yawning action)
If user wants defaults:
- Use default captions in user's conversation language (see Default Captions table below)
Step 1: Static Image Generation with Captions (edit_images)
Goal: Generate 4 static cartoon images with different actions and captions.
- Use
images_understandto deeply analyze the main subject (categories: person, animal, object, Icon/Logo). - Use
edit_imagesfor batch processing of 4 images.
Key Points:
- Subject Consistency (CRITICAL): Must generate based on analysis results. If it's an Icon/Logo, transform it into a 3D toy/figurine while preserving original shape and color scheme.
- Style: Funko Pop / Pop Mart blind box style, chibi character, big head, 3D rendering (C4D/Octane), premium quality finish.
- Background: Clean white background, minimalist.
- Text: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.
- Prompts must be in English for best AI results. Replace
{CAPTION}placeholders with the appropriate caption in user's language.
Default Captions Generation (edit_images)
edit_images(
image_edit_items=[
{
"prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, waving happily, greeting warmly. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_WAVING}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
"base_image_file": "<user uploaded image>",
"output_file": "imgs/sticker_01_hi.png"
},
{
"prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, laughing out loud, holding belly. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_LAUGHING}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
"base_image_file": "<user uploaded image>",
"output_file": "imgs/sticker_02_laugh.png"
},
{
"prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, crying, tears flowing. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_CRYING}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
"base_image_file": "<user uploaded image>",
"output_file": "imgs/sticker_03_cry.png"
},
{
"prompt": "3D cute cartoon style, Funko Pop / Pop Mart blind box style, chibi character, big head, making heart shape with hands, love hearts around. Clean white background, minimalist, high quality 3D render, C4D, octane render. Text rendering: Clear, legible text '{CAPTION_HEART}' written at the bottom. Text style: Black text with thick white outline, bold cute font, floating in front of the character. No spelling errors, no blur, sharp details.",
"base_image_file": "<user uploaded image>",
"output_file": "imgs/sticker_04_love.png"
}
]
)
Custom Captions Generation
When user provides custom captions, infer appropriate actions from caption meaning and adjust the prompt accordingly.
Action Inference Examples:
| Custom Caption | Inferred Action |
|---|---|
| "Good job!" | giving thumbs up, cheering |
| "Sleepy..." | yawning, rubbing eyes, drowsy |
| "Fighting!" | pumping fist, determined pose |
| "Oops" | covering mouth, embarrassed |
| "Hungry~" | drooling, looking at food |
| "Bye bye" | waving goodbye |
| "Angry!" | pouting, arms crossed |
| "Shocked" | jaw dropped, eyes wide |
Output: 4 PNG images (imgs/sticker_01_hi.png, imgs/sticker_02_laugh.png, imgs/sticker_03_cry.png, imgs/sticker_04_love.png).
Step 2: Animated Video Generation (batch_image_to_video)
Goal: Animate the 4 static images into 1-second videos.
- MUST use
batch_image_to_videofor concurrent generation. Serial calls are forbidden. - Duration: Fixed at 1 second per video for speed.
- Resolution: 768P.
- CRITICAL: Prompts must be in English. Always include "keep text clear and stable" to prevent text distortion during animation.
Default Animation Prompts
batch_image_to_video(
count=4,
image_file_list=[
"imgs/sticker_01_hi.png",
"imgs/sticker_02_laugh.png",
"imgs/sticker_03_cry.png",
"imgs/sticker_04_love.png"
],
output_file_list=[
"videos/sticker_01_hi.mp4",
"videos/sticker_02_laugh.mp4",
"videos/sticker_03_cry.mp4",
"videos/sticker_04_love.mp4"
],
prompt_list=[
"Cute cartoon character happily waving hand, enthusiastic greeting, exaggerated adorable motion, keep text clear and stable, High Quality, 1s loop",
"Cute cartoon character laughing out loud, holding belly, shaking with laughter, keep text clear and stable, High Quality, 1s loop",
"Cute cartoon character crying with tears flowing, wiping eyes, keep text clear and stable, High Quality, 1s loop",
"Cute cartoon character making heart shape with hands, shooting love hearts, keep text clear and stable, High Quality, 1s loop"
],
duration_list=[1, 1, 1, 1],
resolution_list=["768P"] * 4
)
Custom Animation Prompts
For custom captions, generate prompts dynamically based on caption meaning:
# Generate prompts dynamically based on caption content
custom_prompts = []
for caption in user_captions:
action = infer_action_from_caption(caption) # AI infers appropriate action
prompt = f"Cute cartoon character {action}, expressive adorable motion, keep text clear and stable, High Quality, 1s loop"
custom_prompts.append(prompt)
batch_image_to_video(
count=4,
image_file_list=[...],
output_file_list=[...],
prompt_list=custom_prompts,
duration_list=[1, 1, 1, 1],
resolution_list=["768P"] * 4
)
Prompt Template for Custom Captions:
Cute cartoon character [ACTION_MATCHING_CAPTION], expressive adorable motion, keep text clear and stable, High Quality, 1s loop
Examples:
| Custom Caption | Generated Action Prompt |
|---|---|
| "Good job!" | "Cute cartoon character giving thumbs up, cheering happily, expressive adorable motion, keep text clear and stable, High Quality, 1s loop" |
| "Sleepy..." | "Cute cartoon character yawning and rubbing eyes, drowsy expression, expressive adorable motion, keep text clear and stable, High Quality, 1s loop" |
| "Fighting!" | "Cute cartoon character pumping fist in the air, determined expression, expressive adorable motion, keep text clear and stable, High Quality, 1s loop" |
| "Oops" | "Cute cartoon character covering mouth in surprise, embarrassed expression, expressive adorable motion, keep text clear and stable, High Quality, 1s loop" |
Output: 4 one-second MP4 video files (videos/sticker_01_hi.mp4, etc.).
Step 3: Format Conversion & Delivery (bash)
Goal: Convert videos to GIF animations and deliver to user.
1. Batch Format Conversion
Call the dedicated Python conversion script:
python3 cookbook/script/convert_mp4_to_gif.py -i videos -o gifs --fps 10 --width 240
2. Deliver GIF Files
Output format (strictly in this order, nothing after):
- Brief status line (e.g., "4 stickers created:")
<deliver_assets>block with all GIF files — this MUST be the last thing in the response
<deliver_assets>
<item>
<path>gifs/sticker_01_hi.gif</path>
</item>
<item>
<path>gifs/sticker_02_laugh.gif</path>
</item>
<item>
<path>gifs/sticker_03_cry.gif</path>
</item>
<item>
<path>gifs/sticker_04_love.gif</path>
</item>
</deliver_assets>
NO summary table, NO closing text after deliver_assets.
Default Sticker Configuration
| # | Action | Filename ID |
|---|---|---|
| 1 | Happy waving | hi |
| 2 | Laughing hard | laugh |
| 3 | Crying tears | cry |
| 4 | Heart gesture | love |
Default Captions by Language
| Action | English | Spanish | French | German | Chinese | Japanese | Korean |
|---|---|---|---|---|---|---|---|
| Waving | Hi~ | ¡Hola! | Salut~ | Hallo~ | 嗨~ | やあ~ | 안녕~ |
| Laughing | LOL | Jajaja | MDR | Haha | 哈哈哈 | 笑 | ㅋㅋㅋ |
| Crying | Boo-hoo | Buaaa | Snif | Heul | 呜呜呜 | えーん | 흑흑 |
| Heart | Love ya | Te quiero | Je t'aime | Liebe | 爱你哦 | 大好き | 사랑해 |
Select captions based on user's conversation language. Users can also provide custom captions in any language.
Tool Reference
| Tool | Step | Usage | Required |
|---|---|---|---|
images_understand | Step 1 | Analyze uploaded photo subject | Yes |
edit_images | Step 1 | Generate 4 static cartoon images with captions | Yes |
batch_image_to_video | Step 2 | Concurrently animate 4 images into 1s videos | Yes |
bash | Step 3 | Run convert_mp4_to_gif.py for GIF conversion | Yes |
File & Output Conventions
- Static images:
imgs/sticker_01_hi.png,imgs/sticker_02_laugh.png,imgs/sticker_03_cry.png,imgs/sticker_04_love.png - Videos:
videos/sticker_01_hi.mp4,videos/sticker_02_laugh.mp4,videos/sticker_03_cry.mp4,videos/sticker_04_love.mp4 - GIFs:
gifs/sticker_01_hi.gif,gifs/sticker_02_laugh.gif,gifs/sticker_03_cry.gif,gifs/sticker_04_love.gif - GIF settings: 10 fps, 240px width
- Video settings: 1 second duration, 768P resolution
Common Mistakes to Avoid
- Never generate captions in the wrong language — Always detect user's conversation language and use matching captions.
- Never use serial video generation — Must use
batch_image_to_videofor concurrent execution. - Never omit "keep text clear and stable" from animation prompts — this prevents text distortion.
- Never add text after
<deliver_assets>— it must be the absolute last element in the response. - Never skip the user preference step — Always ask if user wants custom or default captions before generating.
- Never use non-English prompts for image/video generation — Prompts must be in English for best AI results, only the caption text itself is in user's language.
- Never forget subject consistency — Icon/Logo subjects must preserve original shape and color scheme when transformed into 3D toy/figurine style.