PDF OCR using Gemini LLM

Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "PDF OCR using Gemini LLM" with this command: npx skills add ashtonizmev/geminipdfocr

Purpose

Use geminipdfocr to extract text from PDF documents via OCR (Google Gemini).

Data and privacy

Full page images/files are sent to Google's API. PDFs are split into single-page files and each page is uploaded to Google Gemini for OCR. There are no hidden exfiltration endpoints or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.

Setup (venv installation)

Before first use, create and activate the virtual environment:

cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt

Set GOOGLE_API_KEY in your environment before running (e.g. export GOOGLE_API_KEY=your-key).

How to use

When requested to extract text or perform OCR on a PDF:

  1. Run: cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr <path-to-pdf> [--json] [--output <file>]
  2. Use --json for structured data.
  3. Use --max-pages N for testing or very long documents.
  4. Use --quiet to suppress progress logs.

Requirements

  • A valid PDF file path.
  • GOOGLE_API_KEY set in the process environment (e.g. export GOOGLE_API_KEY=your-key).

CLI options

OptionDescription
pdf_pathOne or more PDF file paths (positional)
--max-pages NLimit pages per PDF
--jsonOutput structured JSON instead of plain text
--output FILEWrite result to file (default: stdout)
--quietSuppress INFO/DEBUG logs

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Fitbit Tracker

Personal Fitbit integration for daily health tracking with adaptive sleep and activity reporting

Registry SourceRecently Updated
General

Ollama Load Balancer

Ollama load balancer for Llama, Qwen, DeepSeek, and Mistral inference across multiple machines. Load balancing with auto-discovery via mDNS, health checks, q...

Registry SourceRecently Updated
General

Google Merchant Center

Google Merchant Center integration. Manage Accounts. Use when the user wants to interact with Google Merchant Center data.

Registry SourceRecently Updated
General

Twitter/X All-in-One — Search, Monitor & Publish Text & Media Posts

Searches and reads X (Twitter): profiles, timelines, mentions, followers, tweet search, trends, lists, communities, and Spaces. Publishes posts, likes/unlike...

Registry SourceRecently Updated