mineru-fast

MinerU fast extract — zero-setup, instant document extraction. Convert PDFs, images, Word (DOCX), and PowerPoint (PPTX) to Markdown with no login, no token, no API key, no configuration required. Just install and run. Powered by the MinerU flash-extract engine with built-in OCR, table recognition, and formula extraction (LaTeX). Handles scanned documents, photos of text, academic papers, contracts, invoices, resumes, and slides out of the box. Use this skill when you need to: quickly extract text from a PDF, convert a document to Markdown without signing up, read a scanned PDF, turn a Word file into Markdown, parse a PowerPoint presentation, OCR an image, extract content from a PDF file, or get a fast document conversion with no setup. Supports 80+ languages including Chinese, English, Japanese, Korean, Arabic, Hindi, French, German, Spanish, Russian, and many more. Works with local files and remote URLs. Ideal for developers, researchers, students, and anyone who wants instant document parsing without accounts or API tokens. Use as a Claude Code skill, agent tool, or standalone CLI. PDF提取、文档转Markdown、免登录PDF转换、快速文档提取、扫描件OCR、图片转文字、Word转Markdown、PPT转Markdown、PDF解析、零配置文档转换。无需注册、无需Token,安装即用,一键提取PDF、Word、PPT、图片中的文字内容。

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mineru-fast" with this command: npx skills add mineru-extract/mineru-fast-extract

Fast Document Extraction with mineru-open-api

Zero-setup, instant document parsing — no login, no token, no configuration needed. Supports tables and formulas (LaTeX).

Installation

npm install -g mineru-open-api

Or via Go (macOS/Linux):

go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Verify installation

mineru-open-api version

Quick start

mineru-open-api flash-extract report.pdf                     # PDF → Markdown (instant!)
mineru-open-api flash-extract report.pdf -o ./out/           # Save to file
mineru-open-api flash-extract resume.docx                    # Word → Markdown
mineru-open-api flash-extract slides.pptx                    # PowerPoint → Markdown
mineru-open-api flash-extract photo.png                      # Image → Markdown (OCR)
mineru-open-api flash-extract https://example.com/doc.pdf    # URL → Markdown

Supported input formats

FormatSupported
PDF (.pdf)Yes
Images (.png, .jpg, .jpeg, .jp2, .webp, .gif, .bmp)Yes
Word (.docx)Yes
PowerPoint (.pptx)Yes
URLs (remote files)Yes

Command: flash-extract

mineru-open-api flash-extract <file-or-url> [flags]

Flags

FlagShortDefaultDescription
--output-o(stdout)Output path (file or directory)
--languagechDocument language
--pages(all)Page range, e.g. 1-10
--timeout900Timeout in seconds

Supported --language values

Values are organized by script/language family — each value covers all languages in its group.

Standalone language packs

ValueIncluded languages说明
chChinese, English, Chinese Traditional中英文(默认值)
ch_serverChinese, English, Chinese Traditional, Japanese繁体、手写体
enEnglish纯英文
japanChinese, English, Chinese Traditional, Japanese日文为主
koreanKorean, English韩文
chinese_chtChinese, English, Chinese Traditional, Japanese繁体中文为主
taTamil, English泰米尔文
teTelugu, English泰卢固文
kaKannada卡纳达文
elGreek, English希腊文
thThai, English泰文

Language family packs

ValueScript/FamilyIncluded languages
latinLatin script (拉丁语系)French, German, Afrikaans, Italian, Spanish, Bosnian, Portuguese, Czech, Welsh, Danish, Estonian, Irish, Croatian, Uzbek, Hungarian, Serbian (Latin), Indonesian, Occitan, Icelandic, Lithuanian, Maori, Malay, Dutch, Norwegian, Polish, Slovak, Slovenian, Albanian, Swedish, Swahili, Tagalog, Turkish, Latin, Azerbaijani, Kurdish, Latvian, Maltese, Pali, Romanian, Vietnamese, Finnish, Basque, Galician, Luxembourgish, Romansh, Catalan, Quechua
arabicArabic script (阿拉伯语系)Arabic, Persian, Uyghur, Urdu, Pashto, Kurdish, Sindhi, Balochi, English
cyrillicCyrillic script (西里尔语系)Russian, Belarusian, Ukrainian, Serbian (Cyrillic), Bulgarian, Mongolian, Abkhazian, Adyghe, Kabardian, Avar, Dargin, Ingush, Chechen, Lak, Lezgin, Tabasaran, Kazakh, Kyrgyz, Tajik, Macedonian, Tatar, Chuvash, Bashkir, Malian, Moldovan, Udmurt, Komi, Ossetian, Buryat, Kalmyk, Tuvan, Sakha, Karakalpak, English
east_slavicEast Slavic (东斯拉夫语系)Russian, Belarusian, Ukrainian, English
devanagariDevanagari script (天城文语系)Hindi, Marathi, Nepali, Bihari, Maithili, Angika, Bhojpuri, Magahi, Santali, Newari, Konkani, Sanskrit, Haryanvi, English

Examples

mineru-open-api flash-extract report.pdf
mineru-open-api flash-extract report.pdf -o ./out/
mineru-open-api flash-extract report.pdf --language en
mineru-open-api flash-extract report.pdf --language latin
mineru-open-api flash-extract report.pdf --pages "1-5"
mineru-open-api flash-extract contract.docx -o ./out/
mineru-open-api flash-extract presentation.pptx -o ./out/
mineru-open-api flash-extract scan.jpg --language ch

Output behavior

  • No -o flag: result goes to stdout; status/progress messages go to stderr
  • With -o flag: result saved to file/directory; progress messages on stderr
  • Markdown output includes extracted images saved alongside the .md file
  • Tables are converted to Markdown tables
  • Formulas are converted to LaTeX format (inline $...$ and block $$...$$)

Agent guidelines

When using this skill on behalf of the user:

  • Always use flash-extract for any input — whether it's a local file or a URL (e.g. https://cdn-mineru.openxlab.org.cn/demo/example.pdf). Do NOT assume a URL means "web page". flash-extract handles URLs to document files directly.
  • Quote file paths that contain spaces or special characters with double quotes. Example: mineru-open-api flash-extract "report 01.pdf".
  • Don't run commands blindly on errors — explain the exit code and troubleshooting steps instead of re-running the command.
  • Installation questions ("mineru 怎么安装") should be answered with the install instructions above.

Default output directory

When the user does NOT specify -o, generate a default output directory:

~/MinerU-Skill/<name>_<hash>/
  • <name>: derived from the source, then sanitized (replace spaces and shell-unsafe characters with _, collapse consecutive _).
    • For URLs: last path segment (e.g. https://arxiv.org/pdf/2509.221862509.22186)
    • For local files: filename without extension (e.g. report.pdfreport)
  • <hash>: first 6 characters of MD5 hash of the full original source.
echo -n "source" | md5sum | cut -c1-6   # Linux
echo -n "source" | md5 | cut -c1-6      # macOS

When the user specifies -o: use the user's path as-is.

Skill upgrade = CLI upgrade

When the user asks to upgrade this skill, re-install the CLI first:

npm install -g mineru-open-api@latest

Exit codes

CodeMeaningRecovery
0Success
1General API or unknown errorCheck network; retry; use --verbose
2Invalid parameters / usage errorCheck command syntax and flag values
4File too large or page limit exceededTry a smaller file or fewer pages
5Extraction failedDocument may be corrupted or unsupported
6TimeoutIncrease with --timeout

Troubleshooting

  • Timeout on large files: Increase with --timeout 1600
  • Extraction quality is poor: Try specifying --language to match the document language
  • HTTP 429: Rate limit hit. Wait a few minutes and retry.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Multi Edge-TTS CN

Edge-TTS 在线语音合成 skill。基于微软 Edge TTS 引擎,生成速度快(1-2秒),支持多种音色和输出格式。同时支持飞书(OGG/Opus)和企业微信(AMR)。默认音色 xiaoxiao_lively。需联网。

Registry SourceRecently Updated
General

vedic-destiny

吠陀命盘分析中文入口。用于完整命盘研判、命主盘 Rashi chart 与九分盘 Navamsha chart 联读、既往事件回看、出生时间稳定度判断、事业主题、婚姻主题、时空盘专题,以及基于 Jagannatha Hora PDF、星盘截图或文本命盘数据的系统拆盘。当用户提到完整星盘、事业方向、婚姻问题、关系窗...

Registry SourceRecently Updated
General

One Person Company OS

Build a visual operating cockpit for an AI-native one-person company across promise, buyer, product, delivery, cash, learning, and assets. / 为 AI 一人公司建立可视化经营...

Registry SourceRecently Updated
General

健康追踪

健康追踪技能 - 追踪饮水、睡眠、步数等健康数据,JSON存储。

Registry SourceRecently Updated