Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI, Google, DashScope (阿里通义万象) and Replicate providers.

Script Directory

Agent Execution:

SKILL_DIR = this SKILL.md file's directory
Script path = ${SKILL_DIR}/scripts/main.ts

Step 0: Load Preferences ⛔ BLOCKING

CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.

Check EXTEND.md existence (priority: project → user):

test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project" test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"

Result Action

Found Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2)

Not found ⛔ Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue

CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.

Path Location

.baoyu-skills/baoyu-image-gen/EXTEND.md

Project directory

$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md

User home

EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models

Schema: references/config/preferences-schema.md

Usage

Basic

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

With aspect ratio

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

High quality

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

From prompt files

npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

With reference images (Google multimodal or OpenAI edits)

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

With reference images (explicit provider/model)

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

Specific provider

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

DashScope (阿里通义万象)

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

Replicate (google/nano-banana-pro)

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Replicate with specific model

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Options

Option Description

--prompt <text> , -p

Prompt text

--promptfiles <files...>

Read prompt from files (concatenated)

--image <path>

Output image path (required)

--provider google|openai|dashscope|replicate

Force provider (default: google)

--model <id> , -m

Model ID (Google: gemini-3-pro-image-preview , gemini-3.1-flash-image-preview ; OpenAI: gpt-image-1.5 )

--ar <ratio>

Aspect ratio (e.g., 16:9 , 1:1 , 4:3 )

--size <WxH>

Size (e.g., 1024x1024 )

--quality normal|2k

Quality preset (default: 2k)

--imageSize 1K|2K|4K

Image size for Google (default: from quality)

--ref <files...>

Reference images. Supported by Google multimodal (gemini-3-pro-image-preview , gemini-3-flash-preview , gemini-3.1-flash-image-preview ) and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI

--n <count>

Number of images

--json

JSON output

Environment Variables

Variable Description

OPENAI_API_KEY

OpenAI API key

GOOGLE_API_KEY

Google API key

DASHSCOPE_API_KEY

DashScope API key (阿里云)

REPLICATE_API_TOKEN

Replicate API token

OPENAI_IMAGE_MODEL

OpenAI model override

GOOGLE_IMAGE_MODEL

Google model override

DASHSCOPE_IMAGE_MODEL

DashScope model override (default: z-image-turbo)

REPLICATE_IMAGE_MODEL

Replicate model override (default: google/nano-banana-pro)

OPENAI_BASE_URL

Custom OpenAI endpoint

GOOGLE_BASE_URL

Custom Google endpoint

DASHSCOPE_BASE_URL

Custom DashScope endpoint

REPLICATE_BASE_URL

Custom Replicate endpoint

Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env

~/.baoyu-skills/.env

Model Resolution

Model priority (highest → lowest), applies to all providers:

CLI flag: --model <id>
EXTEND.md: default_model.[provider]
Env var: <PROVIDER>_IMAGE_MODEL (e.g., GOOGLE_IMAGE_MODEL )
Built-in default

EXTEND.md overrides env vars. If both EXTEND.md default_model.google: "gemini-3-pro-image-preview" and env var GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview exist, EXTEND.md wins.

Agent MUST display model info before each generation:

Show: Using [provider] / [model]
Show switch hint: Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL

Replicate Models

Supported model formats:

owner/name (recommended for official models), e.g. google/nano-banana-pro
owner/name:version (community models by version), e.g. stability-ai/sdxl:<version>

Examples:

Use Replicate default model

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Override model explicitly

npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Provider Selection

--ref provided + no --provider → auto-select Google first, then OpenAI, then Replicate
--provider specified → use it (if --ref , must be google , openai , or replicate )
Only one API key available → use that provider
Multiple available → default to Google

Quality Presets

Preset Google imageSize OpenAI Size Use Case

normal

1K 1024px Quick previews

2k (default) 2K 2048px Covers, illustrations, infographics

Google imageSize: Can be overridden with --imageSize 1K|2K|4K

Aspect Ratios

Supported: 1:1 , 16:9 , 9:16 , 4:3 , 3:4 , 2.35:1

Google multimodal: uses imageConfig.aspectRatio
Google Imagen: uses aspectRatio parameter
OpenAI: maps to closest supported size

Generation Mode

Default: Sequential generation (one image at a time). This ensures stable output and easier debugging.

Parallel Generation: Only use when user explicitly requests parallel/concurrent generation.

Mode When to Use

Sequential (default) Normal usage, single images, small batches

Parallel User explicitly requests, large batches (10+)

Parallel Settings (when requested):

Setting Value

Recommended concurrency 4 subagents

Max concurrency 8 subagents

Use case Large batch generation when user requests parallel

Agent Implementation (parallel mode only):

Launch multiple generations in parallel using Task tool

Each Task runs as background subagent with run_in_background=true

Collect results via TaskOutput when all complete

Error Handling

Missing API key → error with setup instructions
Generation failure → auto-retry once
Invalid aspect ratio → warning, proceed with default
Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal: gemini-3-pro-image-preview , gemini-3.1-flash-image-preview ; or OpenAI GPT Image edits)

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

baoyu-image-gen

Safety Notice

Copy this and send it to your AI assistant to learn