docs-pdf

Parse PDF documents into repository-friendly markdown and text artifacts. Use when users need to extract text, tables, or structure from PDF files.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "docs-pdf" with this command: npx skills add nikhilmaddirala/gtd-cc/nikhilmaddirala-gtd-cc-docs-pdf

PDF Document Parsing

Parse PDF documents into markdown, text, and structured JSON using multi-method extraction.

Usage

Run the parsing script directly:

./scripts/parse_pdf.py <path_to_file.pdf> <output_dir>

Example:

./scripts/parse_pdf.py ~/documents/manual.pdf ./parsed/

The script uses 4 extraction methods:

  • pypdf - Basic text extraction with page markers
  • pdfminer - Detailed layout preservation
  • pdfplumber - Table extraction and structure
  • markitdown - Microsoft's markdown converter

Output Structure

output_dir/
├── file.pdf/
│   ├── parsing_summary.json
│   ├── pypdf/
│   │   └── content.md
│   ├── pdfminer/
│   │   └── content.txt
│   ├── pdfplumber/
│   │   ├── content.md
│   │   └── tables.json
│   └── markitdown/
│       └── content.md

Script Features

  • Handles text-heavy and table-heavy PDFs
  • Preserves layout information where possible
  • Extracts tables as structured JSON
  • Provides multiple format options (md, txt, json)
  • Continues on errors (one method failure doesn't stop others)

Method Selection

  • markitdown - Best for AI understanding (continuous markdown, no page breaks)
  • pdfplumber - Best for documents with complex tables
  • pypdf - Fast fallback for simple text extraction
  • pdfminer - Best when layout preservation is critical

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

tools-catppuccin

No summary provided by upstream source.

Repository SourceNeeds Review
General

productivity-todoist

No summary provided by upstream source.

Repository SourceNeeds Review
General

web-search

No summary provided by upstream source.

Repository SourceNeeds Review