docling

Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU acceleration. Use INSTEAD of web_fetch for extracting content from specific URLs when you need clean, structured text. Use Brave (web_search) for searching/discovering pages. Use docling when you HAVE a URL and need its content parsed.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "docling" with this command: npx skills add er3mit4/docling

Docling - Document & Web Content Extraction

CLI tool for parsing documents and web pages into clean, structured text. Uses GPU acceleration for OCR and ML models.

Prerequisites

  • docling CLI must be installed (e.g., via pipx install docling)
  • For GPU support: NVIDIA GPU with CUDA drivers

When to Use

  • Extract content from a URL → Use docling (not web_fetch)
  • Search for information → Use web_search (Brave)
  • Parse PDFs, DOCX, PPTX → Use docling
  • OCR on images → Use docling

Quick Commands

Web Page → Markdown (default)

docling "<URL>" --from html --to md

Output: creates a .md file in current directory (or use --output)

Web Page → Plain Text

docling "<URL>" --from html --to text --output /tmp/docling_out

PDF with OCR

docling "/path/to/file.pdf" --ocr --device cuda --output /tmp/docling_out

Key Options

OptionValuesDescription
--fromhtml, pdf, docx, pptx, image, md, csv, xlsxInput format
--tomd, text, json, yaml, htmlOutput format
--deviceauto, cuda, cpuAccelerator (default: auto)
--outputpathOutput directory (recommended: use controlled temp dir)
--ocrflagEnable OCR for images/scanned PDFs
--tablesflagExtract tables (default: on)

Security Notes

⚠️ Avoid these flags unless you trust the source:

  • --enable-remote-services - can send data to remote endpoints
  • --allow-external-plugins - loads third-party code
  • Custom --headers with untrusted values - can redirect requests

Workflow

  1. For web content extraction: Use docling "<URL>" --from html --to text --output /tmp/docling_out
  2. Read the output file from the specified output directory
  3. Clean up the output directory after reading

GPU Support

Docling supports GPU acceleration via CUDA (NVIDIA). Verify CUDA is available:

python -c "import torch; print(torch.cuda.is_available())"

Full CLI Reference

See references/cli-reference.md for complete option list.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Olares Shared (olares-cli foundation)

Shared olares-cli foundation: profile model, first-time login (profile login with password + TOTP), bootstrapping a profile from an existing refresh token (p...

Registry SourceRecently Updated
Coding

Olares Dashboard (olares-cli dashboard)

olares-cli dashboard command tree — AI-agent-oriented mirror of the dashboard SPA's Overview2 + Applications2 routes. Covers: the strict dual-shape JSON enve...

Registry SourceRecently Updated
Coding

Olares Settings (olares-cli settings)

olares-cli settings command tree: profile-based reads of every section the SPA's Settings page exposes (https://docs.olares.com/manual/olares/settings/) plus...

Registry SourceRecently Updated
Coding

Olares Market (olares-cli market)

olares-cli market command tree against the per-user Market app-store v2 API: list / get / categories for catalog browsing; install / uninstall / upgrade / cl...

Registry SourceRecently Updated