x-convert-pdf-to-markdown

Two tools are available depending on your needs:

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "x-convert-pdf-to-markdown" with this command: npx skills add arda-industries/agent-skills/arda-industries-agent-skills-x-convert-pdf-to-markdown

Two tools are available depending on your needs:

Tool Best For Speed Size

pymupdf Simple text PDFs Very fast (~12s for 7 files) ~15MB

marker-pdf Complex PDFs with tables, images, OCR Slow ~2GB models

Setup

Both tools are installed in the agent-instructions poetry environment:

cd ~/brain/git/personal/agent-instructions poetry install # if not already done

PyMuPDF (Recommended for text-only PDFs)

Fast and lightweight. Use this for most PDFs.

Single File

cd ~/brain/git/personal/agent-instructions poetry run pymupdf gettext -mode layout -output "/path/to/output.md" "/path/to/file.pdf"

Batch Conversion

cd ~/brain/git/personal/agent-instructions for pdf in /path/to/pdfs/*.pdf; do name=$(basename "$pdf" .pdf) poetry run pymupdf gettext -mode layout -output "/path/to/output/${name}.md" "$pdf" done

Options

Option Description

-mode

simple , blocks , or layout (default: layout preserves formatting)

-output

Output file path

-pages

Page range to extract

marker-pdf (For complex PDFs)

Use when you need OCR, table extraction, or image handling.

Single File

cd ~/brain/git/personal/agent-instructions poetry run marker_single "/path/to/file.pdf" --output_dir "/path/to/output"

Options

Option Description

--output_dir

Directory to save output

--output_format

markdown , json , html , or chunks

--page_range

Process specific pages, e.g., "0,5-10,20"

--force_ocr

Force OCR on all text

First Run

On first use, marker downloads ML models (~2GB). This happens once.

Notes

  • Fully local: Both tools process entirely on your machine, no cloud

  • PyMuPDF: Best for clean, text-based PDFs

  • marker-pdf: Best for scanned docs, tables, or complex layouts

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

x-talk-to-figma-mcp

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

x-update-daily-tasks

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

x-youtube-analyzer

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

x-notion-mcp

No summary provided by upstream source.

Repository SourceNeeds Review