Codex PPT
Overview
This skill creates image-based PPT decks. Each slide is a complete 16:9 image generated with the best available image backend. The image contains the slide title, key points, and visual composition. The generated images are then assembled into a .pptx file with scripts/assemble_ppt.py.
Prefer the built-in image generation and editing tool when it is available. If it is unavailable, or if the user explicitly requests API/CLI mode, use this skill's local fallback CLI at scripts/image_gen.py.
Use When
Use this skill when the user asks to:
- Turn an article, report, paper, document, course note, or rough outline into a PPT.
- Create a visually consistent presentation deck.
- Generate slides as full-page images.
- Produce supporting
outline.mdandspeech.mdfiles. - Assemble generated slide images into a
.pptx.
Do not use this skill for ordinary editable PowerPoint layouts where each textbox, chart, or shape must remain separately editable. This workflow prioritizes visual quality and consistency over editability.
Image Generation Backends
This skill supports two image backends:
- Built-in image tool, preferred when available. Example tool names: Codex
image_gen; OpenClawimage_generate. - Local API/CLI fallback, using
scripts/image_gen.py.
Backend selection rules:
- Prefer the built-in image tool when available. In Codex, this usually means the built-in
image_gentool. In OpenClaw, this may beimage_generate. Resolution, quality, aspect ratio, or slide-edit requests alone do not require CLI/API fallback. - Use CLI/API fallback only when the built-in tool is unavailable, the user explicitly asks for API/CLI or a third-party OpenAI-compatible proxy, or the requested capability is unavailable in the built-in tool.
- Before generating the first image, tell the user which backend you plan to use, why, and ask for confirmation. Do not treat being in a specific agent environment as proof that the built-in image tool is available.
- CLI/API fallback loads
~/.codex-ppt-skill/.envautomatically. Run the CLI normally; do not manually parse.envor ask for configuration before an error. - Ask for configuration only after the CLI reports missing
OPENAI_API_KEY, after authentication/base URL/model errors, or when the user explicitly wants to change API settings. Configure provided values withscripts/codex_ppt_runtime.py config --api-key. - For detailed fallback setup after an error, read
docs/image-model-configuration.md.
CLI/API fallback commands use the shared runtime environment. Let {skill_root} mean the directory containing this SKILL.md.
~/.codex-ppt-skill/.venv/bin/python {skill_root}/scripts/image_gen.py generate \
--model gpt-image-2 \
--prompt-file {prompt_file} \
--size 2560x1440 \
--quality medium \
--out {base_dir}/{deck_name}/origin_image/slide_01.png
For CLI/API fallback, first make sure dependencies are installed:
python3 {skill_root}/scripts/codex_ppt_runtime.py bootstrap
Use the shared runtime config for real API calls. The fallback CLI loads existing config automatically; only load docs/image-model-configuration.md after the CLI reports missing config, when the user explicitly wants to change API key, base URL, or model, or when a real API call reports authentication, permission, base URL, or model availability failure. The fallback CLI accepts model names containing gpt-image-, such as gpt-image-2 or openai/gpt-image-2.
The fallback CLI supports:
generate: create one or more images from a prompt.edit: edit one or more existing images, optionally with a mask.generate-batch: generate many slide images from a JSONL prompt file.
The fallback CLI defaults to 2K 16:9 landscape output, 2560x1440, because it keeps slide text clearer while staying below the gpt-image-2 pixel limit. For 4K landscape slides, use --size 3840x2160 --quality high only when the user asks for 4K, text-heavy slides need sharper output, or the default result is blurry. For portrait assets, use --size 2160x3840 only if the user requests portrait output.
Transparent-background requests:
- Built-in mode should use a flat chroma-key background and local removal when appropriate.
- CLI/API fallback should also prefer chroma-key generation plus
scripts/remove_chroma_key.pyfor simple opaque subjects. gpt-image-2does not support--background transparent. If the user needs true model-native transparency, ask before switching to--model gpt-image-1.5 --background transparent --output-format png.
Workflow
1. Understand Source Content
Read the user-provided content fully enough to identify:
- Main topic and intended audience
- Presentation goal
- Required or implied page count
- Required style or brand constraints
- Any sections that must be included or excluded
If the user did not specify a page count, choose a practical count based on content length. Typical decks are 8-12 slides.
2. Plan The Deck Outline
Create a concise outline.md draft before generating images. For each slide, define:
- Slide number
- Slide title
- 3-5 key points
- Optional visual idea
- Layout role and intent, such as cover, agenda, section divider, concept explanation, process, comparison, timeline, data evidence, architecture, case study, summary, or Q&A
Save the draft to {base_dir}/{deck_name}/outline.md once the project directory is known. If the output directory is not known yet, show the outline in chat first and write it to outline.md immediately after creating the project directory.
Show the outline to the user for confirmation and wait for approval before moving to visual style selection or image generation, unless the user explicitly asked you to skip confirmation. If the user requests changes, update outline.md and ask for confirmation again.
Recommended structure:
Slide 1: Cover
Slide 2: Context / problem
Slide 3-7: Main argument or sections
Slide 8: Summary / recommendation / closing
3. Confirm A Unified Visual Style
Before generating slide images, discuss the visual style with the user. Prefer a multiple-choice question: offer 2-3 concrete style directions and mark one as your recommendation.
Each style option should briefly specify:
- Color palette
- Layout system
- Typography direction
- Illustration or image treatment
- Decorative elements
- Density and whitespace rules
After the user chooses a style, create one final style direction and keep the visual identity consistent across all slide prompts. Keep color palette, typography, texture, icon/illustration language, and overall mood stable. Do not reuse the same layout on every page.
The references/ directory contains optional style references. Use them as inspiration, not as rigid templates. Adapt the style to the topic and audience.
Important: a deck should have one coherent visual identity, not one repeated composition. Treat each reference as a style system: stable palette, typography, icon language, texture, and visual mood; variable page layout chosen from the slide's content role. layout_blueprints are candidate starting points only. Do not apply the same blueprint to every slide.
Available references:
references/清爽专业风.mdreferences/创意杂志风.mdreferences/电子墨水杂志风.mdreferences/数据仪表盘风.mdreferences/科研答辩风.mdreferences/复古扁平插画风.mdreferences/手绘技术解释风.mdreferences/手绘白板风.mdreferences/温暖手工风.md
Example style confirmation:
我建议用 A,因为它最适合这份内容的受众和表达目标。
A. 清爽专业风(推荐):浅色背景、蓝绿强调色、结构清晰,适合汇报、答辩和技术分享。
B. 创意杂志风:大标题、强图片、留白更大胆,适合分享和传播。
C. 数据仪表盘风:指标卡、图表感布局,适合数据密集型报告。
你选哪个?也可以指定要调整的配色、布局或插画方向,或者上传一张喜欢的 PPT 风格图片让我参考。
4. Confirm Image Backend Before Generation
Before generating any slide image, ask the user to confirm the image backend. Keep the confirmation short and concrete:
我准备使用内置图片生成工具生成样张:Codex 中通常是 image_gen,OpenClaw 中通常是 image_generate。当前环境可直接调用该工具,因此不会要求配置第三方 API。可以开始生成 1 页样张吗?
If using CLI/API fallback, say that explicitly and name the configured target:
我准备使用本地 API/CLI fallback 生成样张,读取 ~/.codex-ppt-skill/.env 中的 OPENAI_BASE_URL / CODEX_PPT_IMAGE_MODEL 配置。可以开始生成 1 页样张吗?
Wait for confirmation before generating the sample slide. If the user questions the backend, resolve that before continuing.
5. Generate One Sample Slide For Approval
After the outline, style, and image backend are confirmed, generate exactly one sample slide image before full production.
Sample slide requirements:
- Use the confirmed style description.
- Prefer a representative content slide over the cover when possible.
- Demonstrate the intended deck rhythm: the sample should show how the chosen style adapts to a real content page, not just a generic fixed template.
- Save it directly as the intended final slide filename, such as
{base_dir}/{deck_name}/origin_image/slide_08.png. In CLI/API fallback mode, usescripts/image_gen.py generate --outfor that exact path. - Show the sample image to the user.
- Ask the user to confirm the visual style, typography, layout density, and Chinese text quality.
Do not generate the full deck until the user approves the sample slide. If the user requests changes, revise the style description and regenerate that same slide_XX.png file first. Once approved, keep that file as the final slide for its page. Do not create sample_slide.png in origin_image/, because the assembly step is designed around final slide_XX filenames.
6. Create The Project Directory
Use this output structure:
{base_dir}/{deck_name}/
├── origin_image/
│ ├── slide_01.png
│ ├── slide_02.png
│ └── ...
├── outline.md
├── speech.md
└── {deck_name}.pptx
If the user did not specify a destination, use the current working directory or the directory that contains the source file.
You may initialize the directory structure with:
~/.codex-ppt-skill/.venv/bin/python {skill_root}/scripts/assemble_ppt.py {base_dir} {deck_name}.pptx --init
7. Generate All Slide Images
Generate one image per slide with the selected image backend. Every final slide_XX.png must be produced by the built-in image tool or by scripts/image_gen.py; programmatic rendering or hybrid text overlay is not acceptable for slide image creation.
Use a structured visual brief for each slide. Image generation works best when the prompt separates canvas, style, layout, text, visual elements, and constraints instead of relying only on a long style paragraph.
Keep the deck visually coherent but vary slide layouts according to page semantics. Treat style references and layout_blueprints as candidate patterns, not fixed templates. Across a normal deck, deliberately mix suitable page types such as:
- cover / section divider
- context or problem framing
- process or timeline
- comparison or tradeoff
- data / evidence / KPI
- architecture or workflow diagram
- summary / conclusion / next steps
Avoid generating every slide as the same three-card layout. For each slide, choose a layout that fits its content and explain that choice in the layout.intent field.
{
"type": "16:9 full-slide PowerPoint image",
"language": "Chinese",
"canvas": {
"aspect_ratio": "16:9",
"use_full_canvas": true,
"slide_number": "do not render a slide number"
},
"style": {
"name": "{confirmed style name}",
"visual_direction": "{same final style description for every slide}",
"color_palette": "{main colors and accent colors}",
"typography": "{font personality, hierarchy, weight, text alignment}",
"texture_and_finish": "{flat, paper, dashboard, editorial, whiteboard, etc.}",
"deck_consistency": "same palette, typography, icon language, texture, and mood across all slides"
},
"layout": {
"role": "{cover, agenda, section divider, concept, process, comparison, timeline, data evidence, architecture, case study, summary, Q&A, etc.}",
"intent": "{why this page uses this layout: cover, comparison, timeline, data evidence, workflow, summary, etc.}",
"composition": "{specific layout for this slide}",
"content_zones": "{title zone, body zone, visual zone, footer or callout zones}",
"variation_rule": "same style identity as the deck, but vary composition by slide role; do not repeat the same blueprint on adjacent slides unless the content is part of a deliberate repeated sequence",
"relationship_to_previous_slide": "{new layout, continuation layout, mirrored layout, or deliberate repeated sequence}",
"spacing": "clear hierarchy, coherent alignment, no overlapping elements"
},
"text": {
"title": "{slide title}",
"key_points": ["{point 1}", "{point 2}", "{point 3}"],
"text_quality": "render all Chinese text exactly, clearly, and without garbled characters"
},
"visual_elements": {
"main_visual": "{icons, diagram, chart, illustration, dashboard cards, collage, or other content-specific visual idea}",
"supporting_elements": "{arrows, cards, callouts, decorative elements, labels}"
},
"constraints": [
"The final image itself must contain the title and key points.",
"All text must be readable and correctly spelled.",
"Keep the confirmed style consistent with the rest of the deck.",
"No watermark, no unrelated logo, no extra slide number."
]
}
Save images as:
{base_dir}/{deck_name}/origin_image/slide_01.png
{base_dir}/{deck_name}/origin_image/slide_02.png
...
After each image is generated, copy or move it into {base_dir}/{deck_name}/origin_image/ immediately. Do not leave final slide images only in a temporary or default generated-images directory.
In CLI/API fallback mode, you may generate slides one at a time or use generate-batch. For batch generation, create a JSONL file where each job has a distinct prompt and an out value such as slide_01.png, then run:
~/.codex-ppt-skill/.venv/bin/python {skill_root}/scripts/image_gen.py generate-batch \
--input {base_dir}/{deck_name}/image_prompts.jsonl \
--out-dir {base_dir}/{deck_name}/origin_image \
--size 2560x1440 \
--quality medium \
--concurrency 5
Remove the temporary JSONL prompt file before final delivery unless the user asks to keep it.
Final slide image naming rules:
- Rename final slide images strictly by slide order:
slide_01.png,slide_02.png,slide_03.png, ... - Use zero-padded two-digit numbers for normal decks.
- The approved sample slide should already have the correct
slide_XX.pngfilename and should be reused directly. - Keep rejected variants, drafts, or reference images out of
origin_image/. If you need to preserve them, place them in the project root or a separatedrafts/directory. - Before assembling, verify every expected
slide_XX.pngexists inorigin_image/and that there are no missing or extra final slide images.
For Chinese decks, explicitly ask the image backend to render Chinese text accurately and avoid garbled characters.
8. Quality Check And Repair
Before assembling the PPT, inspect every slide image. Check:
- Text is readable and not garbled.
- Slide content matches the outline.
- Title and key points are not truncated.
- Visual style is consistent across slides.
- No page number appears unless the user requested one.
- Important elements do not overlap.
If a slide has severe text or layout issues, regenerate it with a more constrained prompt. If a slide is mostly correct but has a localized issue, use the selected backend's edit capability when available. In CLI/API fallback mode, use scripts/image_gen.py edit --image {slide_path} --prompt ... --out {new_slide_path} and replace the final slide only after validating the edited output.
9. Write Speaker Notes
Make sure outline.md reflects the final confirmed deck outline from step 2. Do not recreate it from scratch here.
Create speech.md with speaker notes. Keep it useful and concise: 1-3 short paragraphs per slide is usually enough.
Use headings that the assembly script can map back to slide numbers:
## Slide 1: {Title}
{Speaker notes for slide 1}
## Slide 2: {Title}
{Speaker notes for slide 2}
10. Assemble The PPT
Run:
~/.codex-ppt-skill/.venv/bin/python {skill_root}/scripts/assemble_ppt.py {base_dir} {deck_name}.pptx --aspect-ratio 16:9
Important:
{base_dir}is the parent directory of{deck_name}/.{deck_name}.pptxmust match the project folder name.- The script reads images from
{base_dir}/{deck_name}/origin_image/. - The script only reads final images named like
slide_01.png,slide_02.png, etc.; drafts and sample files are ignored. - If
{base_dir}/{deck_name}/speech.mdexists and usesSlide Nheadings, the script writes those notes into the corresponding PPT speaker notes. - The script writes
{base_dir}/{deck_name}/{deck_name}.pptx.
11. Final Report
Report:
- Project directory
- PPT file path
- Slide image directory
outline.mdpathspeech.mdpath- Number of slides
- Confirm which image backend was used: built-in image tool or CLI/API fallback.
- Confirm that speaker notes from
speech.mdwere written into the PPT, if applicable - Any slides that were regenerated or still have known limitations
Local Script Dependencies
Before running scripts/assemble_ppt.py or the CLI/API fallback scripts, make sure the shared runtime exists. If ~/.codex-ppt-skill/.venv/bin/python is missing, or if importing script dependencies fails, create or refresh the environment:
python3 {skill_root}/scripts/codex_ppt_runtime.py bootstrap
This is an internal setup step for the skill. Do not ask the user to run these commands unless dependency installation fails and user approval or troubleshooting is required.
assemble_ppt.py supports 16:9 and 4:3. Use 16:9 unless the user requests otherwise. image_gen.py loads ~/.codex-ppt-skill/.env automatically for OPENAI_API_KEY, OPENAI_BASE_URL, and CODEX_PPT_IMAGE_MODEL. Run python3 {skill_root}/scripts/codex_ppt_runtime.py doctor --check-api when troubleshooting API access.
Prompting Principles
- Keep one global visual style fixed across the deck.
- Vary slide composition by page role; style consistency does not mean repeating the same layout.
- Use
layout_blueprintsas candidate patterns, not mandatory templates. - Generate one slide per image request.
- Prefer concrete visual direction over generic words like "beautiful" or "professional".
- For dense content, split across more slides instead of crowding one slide.
- Prioritize clarity over decoration.