universal-pdf-vision-parser

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max). This skill converts PDF pages to high-res images and 'sees' the content to produce perfectly structured, high-readability Markdown.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "universal-pdf-vision-parser" with this command: npx skills add mingensiie/universal-pdf-vision-parse

Universal PDF Vision Parser Skill

Version: 0.1

This skill is a high-end multilingual document digitizer. It uses multimodal vision to 'look' at each PDF page, making it perfect for language learning notes, bilingual documents, and complex layouts that standard OCR fails to capture.

Prerequisites

  1. DashScope API Key: A valid key from Alibaba Cloud Bailian with qwen-vl-max access.
  2. Environment:
pip install pymupdf dashscope

Usage

Basic Command

python scripts/vision_parse.py --pdf <path_to_pdf> --out <path_to_output.md> --api-key <YOUR_API_KEY> --max-pages 2
  • --max-pages: (Optional) Max pages to process. Defaults to 2. Set to -1 for all pages.

Agentic Workflow

  1. Visual Scanning: Converts PDF pages to 300 DPI PNGs.
  2. Expert Transcription: Qwen-VL-Max identifies the language and transcribes terms, translations, and explanations.
  3. Markdown Structuring: Automatically formats content with bold keywords, italicized meanings, and clean tables.

Examples

User: "Convert this German-Chinese note to markdown: notes.pdf"

Agent Action:

python scripts/vision_parse.py --pdf notes.pdf --out notes.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Fitbit Tracker

Personal Fitbit integration for daily health tracking with adaptive sleep and activity reporting

Registry SourceRecently Updated
General

Ollama Load Balancer

Ollama load balancer for Llama, Qwen, DeepSeek, and Mistral inference across multiple machines. Load balancing with auto-discovery via mDNS, health checks, q...

Registry SourceRecently Updated
General

Google Merchant Center

Google Merchant Center integration. Manage Accounts. Use when the user wants to interact with Google Merchant Center data.

Registry SourceRecently Updated
General

Twitter/X All-in-One — Search, Monitor & Publish Text & Media Posts

Searches and reads X (Twitter): profiles, timelines, mentions, followers, tweet search, trends, lists, communities, and Spaces. Publishes posts, likes/unlike...

Registry SourceRecently Updated