Media Gen 🎬
Generate images and videos with a single AIsa API key. Full support for every image and video model AIsa routes through its Unified LLM Gateway, across three different endpoint paths.
Compatibility
Works with any agentskills.io-compatible harness, including:
- Claude Code and Claude (Anthropic)
- OpenAI Codex
- Cursor
- Gemini CLI (Google)
- OpenCode, Goose, OpenClaw, Hermes
- and any other harness that implements the Agent Skills specification
Requires Python 3, a POSIX shell, and AISA_API_KEY (get one at
aisa.one).
🔥 What You Can Do
Image — Gemini (base64 inline)
"Generate a cyberpunk-style city nightscape, neon lights, rainy night, cinematic feel"
Image — Wan 2.7 (URL in chat response)
"Generate an ultra-detailed product shot of a red panda, studio lighting, sharp focus"
Image — Seedream (OpenAI-compatible, large format)
"Generate a 2048×2048 magazine cover: neo-noir detective portrait, film grain"
Video — text-to-video (Wan t2v)
"Sweeping establishing shot of a neon cyberpunk skyline at dusk, 5 seconds"
Video — image-to-video (Wan i2v)
"Starting from this reference image, gentle camera push-in with parallax"
Supported Models
Image generation — 4 models, 3 endpoints
| Model | Developer | Endpoint | Notes |
|---|---|---|---|
gemini-3-pro-image-preview | POST /v1/models/{model}:generateContent | Images returned as base64 in candidates[].parts[].inline_data | |
wan2.7-image | Alibaba | POST /v1/chat/completions | Images returned as URL parts in choices[].message.content[] (type=image). $0.030/image |
wan2.7-image-pro | Alibaba | POST /v1/chat/completions | Higher fidelity. $0.075/image |
seedream-4-5-251128 | ByteDance | POST /v1/images/generations | OpenAI-compatible. Minimum 3,686,400 pixels (e.g. 1920×1920). $0.040/image |
Video generation — 4 Wan variants, 1 endpoint
| Model | Kind | Image field | Output SR |
|---|---|---|---|
wan2.6-t2v | text-to-video | none | 1080 |
wan2.6-i2v | image-to-video | input.img_url (string) | 720 |
wan2.7-t2v | text-to-video | none | 720 |
wan2.7-i2v | image-to-video | input.media (array) ⚠ | 720 |
⚠ Schema trap on
wan2.7-i2v. It takes the reference image ininput.media(array of URLs), notinput.img_urllikewan2.6-i2v. Submissions withoutmediareturn HTTP 200 with atask_id, then fail downstream withInvalidParameter: Field required: input.media. The bundled client routes this automatically — just pass--img-urland pick the model.
Quick Start
export AISA_API_KEY="your-key"
# Any image model — client routes to the right endpoint
python3 scripts/media_gen_client.py image \
--model gemini-3-pro-image-preview \
--prompt "A cute red panda, cinematic lighting" \
--out out.png
python3 scripts/media_gen_client.py image \
--model wan2.7-image-pro \
--prompt "Ultra-detailed product shot of a red panda" \
--out out.png
python3 scripts/media_gen_client.py image \
--model seedream-4-5-251128 \
--prompt "Neo-noir detective portrait, film grain" \
--size 2048x2048 \
--out out.png
# Video — text-to-video (no image needed)
python3 scripts/media_gen_client.py video-create \
--model wan2.7-t2v \
--prompt "Sweeping shot of a neon cyberpunk skyline"
# Video — image-to-video on wan2.7-i2v (client routes to input.media[])
python3 scripts/media_gen_client.py video-create \
--model wan2.7-i2v \
--prompt "gentle zoom with parallax" \
--img-url "https://example.com/reference.jpg" \
--duration 5
# Wait and download
python3 scripts/media_gen_client.py video-wait \
--task-id <task_id> --download --out out.mp4
🖼️ Image Generation — endpoint reference
Gemini family → POST /v1/models/{model}:generateContent
Documentation: Google Gemini Chat.
curl -X POST "https://api.aisa.one/v1/models/gemini-3-pro-image-preview:generateContent" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents":[
{"role":"user","parts":[{"text":"A cute red panda, cinematic lighting"}]}
]
}'
Response contains candidates[].parts[].inline_data with {mime_type, data}
where data is a base64 PNG.
Wan 2.7 family → POST /v1/chat/completions
Documentation: Image Generation via Chat.
Critical rule: messages[].content must be an array of typed parts.
A plain string returns HTTP 400 invalid_parameter_error.
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan2.7-image",
"messages": [
{"role":"user","content":[
{"type":"text","text":"A cute red panda, ultra-detailed, cinematic lighting"}
]}
],
"n": 1
}'
Images come back as {type: "image", image: "<url>"} parts inside
choices[].message.content[].
Seedream → POST /v1/images/generations
Documentation: OpenAI-Compatible Image Generations.
curl -X POST "https://api.aisa.one/v1/images/generations" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedream-4-5-251128",
"prompt": "A cute red panda, ultra-detailed, cinematic lighting",
"n": 1,
"size": "2048x2048"
}'
Response: data[].url or data[].b64_json. Upstream enforces a
minimum of 3,686,400 pixels. 1024×1024 and 1536×1536 get rejected.
Any aspect ratio works as long as width × height ≥ 3,686,400.
🎞️ Video Generation — endpoint reference
Create task → POST /apis/v1/services/aigc/video-generation/video-synthesis
Documentation: Create video generation task.
Header X-DashScope-Async: enable is required.
# wan2.6-t2v — text-to-video
curl -X POST "https://api.aisa.one/apis/v1/services/aigc/video-generation/video-synthesis" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
"model":"wan2.6-t2v",
"input":{"prompt":"cinematic close-up, slow push-in"},
"parameters":{"resolution":"720P","duration":5}
}'
# wan2.7-i2v — image-to-video (⚠ input.media not input.img_url)
curl -X POST "https://api.aisa.one/apis/v1/services/aigc/video-generation/video-synthesis" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
"model":"wan2.7-i2v",
"input":{
"prompt":"gentle zoom with parallax",
"media":["https://example.com/reference.jpg"]
},
"parameters":{"resolution":"720P","duration":5}
}'
Poll task → GET /apis/v1/services/aigc/tasks/{task_id}
Documentation: Get video generation task result.
task_idis a path parameter. The query-string form?task_id=...returns HTTP 500unsupported uri.
curl "https://api.aisa.one/apis/v1/services/aigc/tasks/YOUR_TASK_ID" \
-H "Authorization: Bearer $AISA_API_KEY"
Python Client
The bundled client at scripts/media_gen_client.py auto-routes each
image model to the correct endpoint and normalizes the response to a
saved file.
# Image — model picks the endpoint
python3 scripts/media_gen_client.py image \
--model <gemini-3-pro-image-preview | wan2.7-image | wan2.7-image-pro | seedream-4-5-251128> \
--prompt "..." \
--out out.png
# Video — create task
python3 scripts/media_gen_client.py video-create \
--model <wan2.6-t2v | wan2.6-i2v | wan2.7-t2v | wan2.7-i2v> \
--prompt "..." \
[--img-url https://... (required for -i2v models)] \
[--duration 5|10] \
[--resolution 720P|1080P]
# Video — poll / wait / download
python3 scripts/media_gen_client.py video-status --task-id <id>
python3 scripts/media_gen_client.py video-wait --task-id <id> --poll 10 --timeout 600
python3 scripts/media_gen_client.py video-wait --task-id <id> --download --out out.mp4
API Reference
This skill calls the following AIsa endpoints directly:
- Google Gemini Chat —
generateContent— Gemini image models - Image Generation via Chat — Wan 2.7 image family
- OpenAI-Compatible Image Generations — Seedream
- Create video generation task — all 4 Wan video variants
- Get video generation task result — async polling
See the full AIsa API Reference for the complete catalog.
License
MIT — see LICENSE at the repo root.