PDF Conversion Skill
Overview
Converts PDF files to clean, readable Markdown format with proper formatting, image extraction, and table preservation.
When to Use
-
"Convert this PDF to markdown"
-
"Extract text from PDF"
-
"Transform PDF to MD format"
-
"Process PDFs in folder X"
Related Resources
- SOP: Operator Team OS/1. SOPs/convert_pdfs_to_markdown.md
Scripts
Script Purpose
pdf_to_md.py
Main PDF to Markdown converter
Usage
python3 scripts/pdf_to_md.py /path/to/input.pdf /path/to/output.md
Features
-
Text extraction with OCR fallback
-
Image extraction and embedding
-
Table detection and formatting
-
Heading structure preservation
-
Automatic cleanup of formatting artifacts
Edge Cases
-
Scanned PDFs: Falls back to OCR
-
Complex tables: Best-effort conversion
-
Encrypted PDFs: Will fail (requires password)