System Dependencies
uv must already be installed because this skill is executed with uv run, and uv installs the Python dependencies declared in src/main.py.
ffmpeg is needed for tts because the speech output is normalized and written as an .mp3 file through ffmpeg.
tesseract is needed for ocr because it performs the actual optical character recognition on scanned page images.
pdfimages is also needed for ocr because it extracts page images from PDFs before those images are passed to tesseract; pdfimages comes from poppler.
pandoc is optional for convert because it can convert between many document formats when text-based conversion is possible.
libreoffice is an optional alternative to pandoc for convert because it can handle document conversions that pandoc may not support well.
File Access And Network Behavior
- This skill operates on the file paths provided by the caller. It can read from and write to any host path the caller supplies; it is not limited to the OpenClaw workspace.
- The
/root/.openclaw/workspace/... paths in the command examples show where the skill entrypoint lives. They do not restrict which files the skill can access.
- The
tts command uses edge-tts, which sends the input text to an external text-to-speech service over the network to generate audio.
- Do not use
tts with sensitive or private text unless you are comfortable sending that text off-host.
- All other commands run locally on the host, subject to the optional local binaries documented below.
Skill: PDF Toolkit
When to use
- User wants to extract text, tables, or images from a PDF.
- User wants to get metadata or page count from a PDF.
- User wants to merge, split, or rotate a PDF.
- User wants to create a new PDF from plain text or Markdown.
- User wants to read or write a DOCX file.
- User wants to OCR a scanned PDF (requires
tesseract on host).
- User wants to convert text or a document to an MP3 audio file (requires
ffmpeg on host).
- User wants to convert between document formats (requires
pandoc or libreoffice on host).
- User wants to check which optional system tools are available.
When NOT to use
- User wants to view or render a PDF visually — use a PDF viewer.
- User wants to fill in PDF form fields — this skill does not support AcroForms.
- User wants to edit an existing PDF's text in-place — use a dedicated PDF editor.
Commands
Check available tools
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py doctor
Get PDF metadata and page count
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py info <pdf_path>
Extract text from a PDF
# All pages
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-text <pdf_path>
# Specific pages (1-indexed, comma-separated or ranges)
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-text <pdf_path> --pages 1,3,5-8
Extract tables from a PDF
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-tables <pdf_path>
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-tables <pdf_path> --pages 2-4
Extract images from a PDF
# Saves images to current directory by default
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-images <pdf_path>
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-images <pdf_path> --output-dir /path/to/output
Merge PDFs
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py merge <pdf1> <pdf2> [<pdf3> ...] --output merged.pdf
Split a PDF
# Split into individual pages
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py split <pdf_path> --output-dir /path/to/output
# Extract a page range into a new PDF
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py split <pdf_path> --pages 2-5 --output extracted.pdf
Rotate pages in a PDF
# Rotate all pages 90 degrees clockwise
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py rotate <pdf_path> --degrees 90 --output rotated.pdf
# Rotate specific pages
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py rotate <pdf_path> --degrees 180 --pages 1,3 --output rotated.pdf
Create a PDF from text
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py create-pdf --text "Hello, world!" --output hello.pdf
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py create-pdf --file input.txt --output document.pdf
Read a DOCX file
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py read-docx <docx_path>
Write a DOCX file
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py write-docx --text "Content here" --output document.docx
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py write-docx --file input.txt --output document.docx
OCR a scanned PDF (requires tesseract)
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py ocr <pdf_path>
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py ocr <pdf_path> --pages 1-3 --lang eng
Convert text or document to speech (requires ffmpeg)
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --text "Hello, world!" --output speech.mp3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --file input.txt --output speech.mp3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --file document.pdf --output speech.mp3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --text "Hello" --voice en-GB-SoniaNeural --output speech.mp3
Convert document formats (requires pandoc or libreoffice)
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py convert <input_path> --output <output_path>
Examples
# Inspect a PDF
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py info report.pdf
# Pull text from pages 1–3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py extract-text report.pdf --pages 1-3
# Merge two PDFs
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py merge a.pdf b.pdf --output combined.pdf
# OCR a scanned document
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py ocr scan.pdf
# Read a Word document
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py read-docx report.docx
# Text to MP3
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py tts --text "Welcome to the future." --output welcome.mp3
# Check what is available on this host
uv run /root/.openclaw/workspace/skills/pdf-toolkit/src/main.py doctor
Chat Delivery
- When this skill is used in a chat interface that supports file attachments, such as Telegram, any generated output file should be sent back to the user as an attachment after successful creation or conversion.
- This applies to commands that create files, including
create-pdf, write-docx, extract-images, merge, split, rotate, tts, and convert.
- If a temporary output file is created in the Claw runtime temporary folder for delivery, delete that temporary file immediately after the file has been sent successfully to the user.
- Do not delete files that were written to a user-requested destination outside the Claw temporary folder.
- If the chat environment cannot send file attachments, report the output path clearly instead of claiming the file was delivered.
Output
- Plain text with labeled sections separated by blank lines.
- Errors are prefixed with
Error:.
- The
doctor command shows a table of available and missing tools.
Notes
uv run reads the inline # /// script dependency block in main.py and auto-installs Python packages in an isolated environment — no pip install or venv setup needed.
- Core features (info, extract-text, extract-tables, merge, split, rotate, create-pdf, read-docx, write-docx) work with
uv alone — no system binaries required.
- OCR requires
tesseract installed on the host (brew install tesseract / apt install tesseract-ocr). Also needs pdfimages from poppler (brew install poppler).
- TTS requires
ffmpeg installed on the host (brew install ffmpeg / apt install ffmpeg).
- Document conversion requires
pandoc or libreoffice on the host.
- Run
doctor first if you are unsure which features are available.