pdf-to-html

Convert PDF documents to HTML using MinerU. Transforms PDF files into web-ready HTML with structure and formatting preserved. Features: PDF to HTML conversion with layout preservation. Handles text, tables, images, and formatting. Supports local files and URLs. Token-based extraction for full features. Use when you need to: convert a PDF to HTML, turn a PDF into a web page, generate HTML from a PDF document, publish PDF content on the web. Use when asked: 'how do I convert PDF to HTML', 'turn this PDF into HTML', 'I want to view this PDF as a web page', 'can my agent convert PDF to HTML', 'is there a skill for PDF to HTML conversion'. Built on MinerU by OpenDataLab (Shanghai AI Lab), an open-source document intelligence engine. Perfect for web publishers, content teams, and developers who need to convert PDF documents into HTML for web display, CMS integration, or online archives.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pdf-to-html" with this command: npx skills add mzlzyca/pdf-to-html

PDF to HTML

Convert PDF files to HTML using MinerU.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Convert PDF to HTML (requires token)
mineru-open-api extract report.pdf -f html -o ./out/

# From URL
mineru-open-api extract https://example.com/report.pdf -f html -o ./out/

# With language hint
mineru-open-api extract report.pdf -f html --language en -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .pdf (local file or URL)
  • Output format: HTML (-f html)
  • HTML output requires extract with token — not available in flash-extract
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (e.g. 1-10)

Notes

  • HTML output (-f html) is only available via extract with token
  • Output goes to stdout by default; use -o <dir> to save to a file
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Gigo Lobster Resume

🦞 GIGO · gigo-lobster-resume: 续跑入口:v2 stable 当前会清理旧 checkpoint 并从头重跑;保留此 slug 作为旧 checkpoint 兼容入口。 Triggers: 继续试吃 / 恢复评测 / resume tasting / continue lobster...

Registry SourceRecently Updated
General

YiHui CONTEXT MODE

context-mode is an MCP server that saves 98% of your context window by sandboxing tool outputs. It routes large file reads, shell outputs, and web fetches th...

Registry SourceRecently Updated
General

xinyi-drink

Use when users ask about 新一好喝/新一咖啡 drinks, stores, menu, activities, Skill用户大礼包, today drink recommendations, afternoon tea, feeling sleepy, or personalized...

Registry SourceRecently Updated
General

vedic-destiny

吠陀命盘分析中文入口。用于完整命盘研判、命主盘 Rashi chart 与九分盘 Navamsha chart 联读、既往事件回看、出生时间稳定度判断、事业主题、婚姻主题、时空盘专题,以及基于 Jagannatha Hora PDF、星盘截图或文本命盘数据的系统拆盘。当用户提到完整星盘、事业方向、婚姻问题、关系窗...

Registry SourceRecently Updated