agent-paddleocr-vision

Multi-language document understanding with PaddleOCR

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-paddleocr-vision" with this command: npx skills add NHZallen/agent-paddleocr-vision

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

  • OCR extraction via PaddleOCR cloud API (requires credentials)
  • 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
  • Action suggestion with structured parameters
  • Batch processing
  • Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

TypeActions
Invoicecreate_expense, archive, tax_report
Business Cardadd_contact, save_vcard
Receiptcreate_expense, split_bill
Tableexport_csv, analyze_data
Contractsummarize, extract_dates, flag_obligations
ID Cardextract_id_info, verify_age
Passportstore_passport_info, check_validity
Bank Statementcategorize_transactions, generate_report
Driver Licensestore_license_info, check_expiry
Tax Formsummarize_tax, suggest_deductions
Generalsummarize, translate, search_keywords

Configuration

Required environment variables:

  • PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
  • PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

  • PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

  • 中文: docs/README.zh.md
  • English: docs/README.en.md
  • Español: docs/README.es.md
  • العربية: docs/README.ar.md

License

MIT-0


Made for OpenClaw. Let your agent see and act.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

ComfyUI Controller Pro

支持批量生成10-100个修仙视频和图片,集成LTX2多版本模型与自动化浏览器及工作流管理功能。

Registry SourceRecently Updated
120Profile unavailable
Automation

微信QQ自动发消息

Windows 平台微信和 QQ 自动发消息工具。支持搜索联系人、发送消息、截图OCR分析、智能回复建议(需用户确认后发送)。

Registry SourceRecently Updated
3033Profile unavailable
Automation

PR's PDF Agent

Self-hosted PDF operations and conversions with metered usage output.

Registry SourceRecently Updated
1472Profile unavailable
Automation

Finance Automation

Automates payments, invoices, expenses, and financial reports with Stripe webhooks and real-time Telegram notifications for streamlined finance management.

Registry SourceRecently Updated
3720Profile unavailable