Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

OCR extraction via PaddleOCR cloud API (requires credentials)
11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
Action suggestion with structured parameters
Batch processing
Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

Type	Actions
Invoice	create_expense, archive, tax_report
Business Card	add_contact, save_vcard
Receipt	create_expense, split_bill
Table	export_csv, analyze_data
Contract	summarize, extract_dates, flag_obligations
ID Card	extract_id_info, verify_age
Passport	store_passport_info, check_validity
Bank Statement	categorize_transactions, generate_report
Driver License	store_license_info, check_expiry
Tax Form	summarize_tax, suggest_deductions
General	summarize, translate, search_keywords

Configuration

Required environment variables:

PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

中文: docs/README.zh.md
English: docs/README.en.md
Español: docs/README.es.md
العربية: docs/README.ar.md

License

MIT-0

Made for OpenClaw. Let your agent see and act.