pdf-processing

PDF Processing

This skill provides tools and guidance for extracting content from PDF documents.

Quick Start

Use pdfplumber to extract text:

import pdfplumber

with pdfplumber.open("document.pdf") as pdf: text = pdf.pages[0].extract_text()

Installation

Install the required dependencies:

pip install pdfplumber

Basic Text Extraction

For simple text extraction from a PDF:

import pdfplumber

def extract_text(pdf_path): """Extract all text from a PDF file.""" text = [] with pdfplumber.open(pdf_path) as pdf: for page in pdf.pages: page_text = page.extract_text() if page_text: text.append(page_text) return "\n\n".join(text)

Table Extraction

For extracting tables from PDFs:

import pdfplumber

def extract_tables(pdf_path): """Extract all tables from a PDF file.""" tables = [] with pdfplumber.open(pdf_path) as pdf: for page in pdf.pages: page_tables = page.extract_tables() tables.extend(page_tables) return tables

Form Filling

For filling PDF forms, see references/FORMS.md.

Advanced Table Extraction

For complex tables with merged cells, see references/TABLES.md and run scripts/extract.py .

pdf-processing

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

pdf-processing

pdf processing

pdf processing