image2pptx

Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip preserves illustrations, LAMA inpaints clean background, python-pptx assembles editable text boxes with auto-scaled fonts and detected colors. Trigger on 'convert image to pptx', 'make slide editable', 'image to powerpoint', 'extract text from slide as editable', 'reconstruct slide', or when the user has a slide/poster image and wants an editable .pptx file.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "image2pptx" with this command: npx skills add Jade Liu/image2pptx

image2pptx: Image to Editable PowerPoint

What It Does

Converts a static image into an editable .pptx file where every text element is a selectable, editable text box over a clean inpainted background.

  1. OCR (PaddleOCR PP-OCRv5) — detects text regions with bounding boxes and content
  2. Textmask (classical CV) — finds text ink pixels via adaptive thresholding
  3. Mask-clip — ANDs textmask with OCR bboxes to preserve non-text elements
  4. Inpaint (LAMA) — reconstructs masked regions with neural inpainting
  5. Assemble — places editable text boxes with auto-scaled fonts and detected colors

When to Use

ScenarioRecommendation
Slide with text on solid/flat backgroundBest results
Slide with photo backgroundGood — uses inpainting (warn about overlap areas)
Slide with solid backgroundGood — use --skip-inpaint for speed
Chinese/multilingual slideGood — ch OCR handles both Chinese and English
Poster or infographic with textGood — works well if text is separate from graphics
Dense chart with axis labels on barsCaution — line grouping may over-merge crowded labels
Very thick/large decorative fontsCaution — may exceed standard mask dilation range
Extract individual assets as PNGsNo — use px-asset-extract
Read text without creating PPTXNo — use OCR directly
Edit an existing .pptx fileNo — use the pptx skill

Installation

git clone https://github.com/JadeLiu-tech/px-image2pptx.git
cd px-image2pptx
pip install -e ".[all]"

Usage

CLI

px-image2pptx slide.png -o output.pptx
px-image2pptx slide.png -o output.pptx --lang ch
px-image2pptx slide.png -o output.pptx --skip-inpaint
px-image2pptx slide.png -o output.pptx --ocr-json text_regions.json
px-image2pptx slide.png -o output.pptx --work-dir ./debug/

Python API

from px_image2pptx import image_to_pptx

report = image_to_pptx("slide.png", "output.pptx")

# With options
report = image_to_pptx(
    "slide.png", "output.pptx",
    lang="ch",
    skip_inpaint=False,
    work_dir="./debug/",
)

CLI Options

OptionDefaultDescription
-o, --outputoutput.pptxOutput PPTX path
--ocr-jsonPre-computed OCR JSON (skips OCR)
--langautoOCR language: auto, en, ch
--sensitivity16Textmask sensitivity (lower = more)
--dilation12Textmask dilation pixels
--min-font8Min font size in points
--max-font72Max font size in points
--skip-inpaintSkip LAMA inpainting
--work-dirSave intermediate files

Models

Downloaded automatically on first use (~370 MB total). All models are from official open-source repositories.

ModelSizeLicenseSource
PP-OCRv5_server_det84 MBApache 2.0PaddlePaddle/PaddleOCR
PP-OCRv5_server_rec81 MBApache 2.0PaddlePaddle/PaddleOCR
big-lama196 MBApache 2.0advimman/lama

Models are cached locally after first download (~/.paddlex/official_models/ for OCR, ~/.cache/torch/hub/checkpoints/ for LAMA). To skip model downloads entirely, use --ocr-json with pre-computed OCR and --skip-inpaint.

Limitations — When to Warn the User

InputImpactWhat to tell the user
Text on solid/flat backgroundBest resultsNo caveats needed
Text on textured backgroundGood resultsLAMA handles repeating textures well
Text overlapping photosInpainting artifacts likely"Areas where text covers photos may show blurring"
Dense chart with many labelsOver-merged labels"Crowded labels may be grouped incorrectly"
Very thick/large fontsIncomplete mask coverage"Large fonts may exceed dilation range — try increasing --dilation"
Light text on dark backgroundBlockier inpainting"White-on-dark text uses box masks instead of tight ink masks"
WebP imageOCR fails (0 regions)Convert to PNG first: Image.open("in.webp").save("in.png")
Very large image (>4000px)Slow inpaintingSuggest --skip-inpaint or downscaling
Decorative/handwritten fontsTypeface won't match"Fonts are reconstructed as Arial/Helvetica"
Centered/justified textLeft-aligned output"Text alignment is not preserved"

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Word OCR

OCR and text extraction from Word documents (.docx, .doc) using the MinerU API. This skill leverages mineru-open-api CLI to perform optical character recogni...

Registry SourceRecently Updated
900Profile unavailable
Research

Knowledge Base Skill

Multi-business knowledge base with image attachment + OCR support. Manage Q&A databases by business type, auto page splitting, and intelligent search.

Registry SourceRecently Updated
2050Profile unavailable
General

feishu-doc-extended

飞书文档扩展工具,提供图片下载和 OCR 识别功能。需要配合内置 feishu 插件使用。

Registry SourceRecently Updated
3181Profile unavailable
Research

MiniMax Vision Analysis

Analyze, describe, and extract information from images using the MiniMax vision MCP tool. Use when: user shares an image file path or URL (any message contai...

Registry SourceRecently Updated
2960Profile unavailable