agent-paddleocr-vision

Multi-language document understanding with PaddleOCR

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-paddleocr-vision" with this command: npx skills add nhzallen/agent-paddleocr-vision

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

  • OCR extraction via PaddleOCR cloud API (requires credentials)
  • 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
  • Action suggestion with structured parameters
  • Batch processing
  • Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

TypeActions
Invoicecreate_expense, archive, tax_report
Business Cardadd_contact, save_vcard
Receiptcreate_expense, split_bill
Tableexport_csv, analyze_data
Contractsummarize, extract_dates, flag_obligations
ID Cardextract_id_info, verify_age
Passportstore_passport_info, check_validity
Bank Statementcategorize_transactions, generate_report
Driver Licensestore_license_info, check_expiry
Tax Formsummarize_tax, suggest_deductions
Generalsummarize, translate, search_keywords

Configuration

Required environment variables:

  • PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
  • PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

  • PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

  • 中文: docs/README.zh.md
  • English: docs/README.en.md
  • Español: docs/README.es.md
  • العربية: docs/README.ar.md

License

MIT-0


Made for OpenClaw. Let your agent see and act.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

china-doc-ocr

智能文档OCR识别与结构化提取。Use when the user has a complex document, PDF, scanned image, photo, invoice, receipt, ID card, table, or chart that needs to be recognized a...

Registry SourceRecently Updated
1.3K0Profile unavailable
Automation

Task Automation Workflows

Automate repetitive tasks with scripts, workflows, and schedules. Create efficient automation for file operations, data processing, API calls, and scheduled...

Registry SourceRecently Updated
5050Profile unavailable
Automation

SuperBased

Eyes AND hands for OpenClaw — capture, AI vision, OCR, recording, voice dictation, and full GUI automation via 72 MCP tools. Use when the agent needs to see...

Registry SourceRecently Updated
590Profile unavailable
Automation

智能文档处理Skill

基于DeepSeek v4技术,支持PDF、Word、Excel等格式文档的智能解析、信息提取、内容分析和格式转换,准确率达99%。

Registry SourceRecently Updated
1040Profile unavailable