pymupdf-pdf

Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pymupdf-pdf" with this command: npx skills add kesslerio/pymupdf-pdf-parser-clawdbot-skill

PyMuPDF PDF

Overview

Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.

Prereqs / when to read references

If you hit import errors (PyMuPDF not installed) or Nix libstdc++ issues, read:

  • references/pymupdf-notes.md

Quick start (single PDF)

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
  --format md \
  --outroot ./pymupdf-output

Options

  • --format md|json|both (default: md)
  • --images to extract images
  • --tables to extract a simple line-based table JSON (quick/rough)
  • --outroot DIR to change output root
  • --lang adds a language hint into JSON output metadata

Output conventions

  • Create ./pymupdf-output/<pdf-basename>/ by default.
  • Markdown output: output.md
  • JSON output: output.json (includes lang)
  • Images: images/ subdir
  • Tables: tables.json (rough line-based)

Notes

  • PyMuPDF is fast but less robust on complex PDFs.
  • For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Hit Preview EN

Analyzes English short video scripts for TikTok, YouTube Shorts, and Instagram Reels to predict hit potential and suggest improvements with AI or local fallb...

Registry SourceRecently Updated
General

Local Document AI OpenVINO

Parse local PDFs and document images with PaddleOCR-VL or PaddleOCR-VL-1.5 on OpenVINO, then route the structured parse into downstream document-to-data or d...

Registry SourceRecently Updated
General

Auto Study

Use when performing study tasks on browser-based platforms such as Yuketang, Xuexitong, Zhihuishu, and Pintia, including answering quizzes and page actions.

Registry SourceRecently Updated
General

宠物体态健康分析技能

Identifies obesity, emaciation, external injuries, skin abnormalities, and abnormal mental states, helping pet owners detect health issues promptly. | 宠物体态健康...

Registry SourceRecently Updated