Fast Document Extraction with mineru-open-api
Zero-setup, instant document parsing — no login, no token, no configuration needed. Supports tables and formulas (LaTeX).
Installation
npm install -g mineru-open-api
Or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
Verify installation
mineru-open-api version
Quick start
mineru-open-api flash-extract report.pdf # PDF → Markdown (instant!)
mineru-open-api flash-extract report.pdf -o ./out/ # Save to file
mineru-open-api flash-extract resume.docx # Word → Markdown
mineru-open-api flash-extract slides.pptx # PowerPoint → Markdown
mineru-open-api flash-extract photo.png # Image → Markdown (OCR)
mineru-open-api flash-extract https://example.com/doc.pdf # URL → Markdown
Supported input formats
| Format | Supported |
|---|---|
PDF (.pdf) | Yes |
Images (.png, .jpg, .jpeg, .jp2, .webp, .gif, .bmp) | Yes |
Word (.docx) | Yes |
PowerPoint (.pptx) | Yes |
| URLs (remote files) | Yes |
Command: flash-extract
mineru-open-api flash-extract <file-or-url> [flags]
Flags
| Flag | Short | Default | Description |
|---|---|---|---|
--output | -o | (stdout) | Output path (file or directory) |
--language | ch | Document language | |
--pages | (all) | Page range, e.g. 1-10 | |
--timeout | 900 | Timeout in seconds |
Supported --language values
Values are organized by script/language family — each value covers all languages in its group.
Standalone language packs
| Value | Included languages | 说明 |
|---|---|---|
ch | Chinese, English, Chinese Traditional | 中英文(默认值) |
ch_server | Chinese, English, Chinese Traditional, Japanese | 繁体、手写体 |
en | English | 纯英文 |
japan | Chinese, English, Chinese Traditional, Japanese | 日文为主 |
korean | Korean, English | 韩文 |
chinese_cht | Chinese, English, Chinese Traditional, Japanese | 繁体中文为主 |
ta | Tamil, English | 泰米尔文 |
te | Telugu, English | 泰卢固文 |
ka | Kannada | 卡纳达文 |
el | Greek, English | 希腊文 |
th | Thai, English | 泰文 |
Language family packs
| Value | Script/Family | Included languages |
|---|---|---|
latin | Latin script (拉丁语系) | French, German, Afrikaans, Italian, Spanish, Bosnian, Portuguese, Czech, Welsh, Danish, Estonian, Irish, Croatian, Uzbek, Hungarian, Serbian (Latin), Indonesian, Occitan, Icelandic, Lithuanian, Maori, Malay, Dutch, Norwegian, Polish, Slovak, Slovenian, Albanian, Swedish, Swahili, Tagalog, Turkish, Latin, Azerbaijani, Kurdish, Latvian, Maltese, Pali, Romanian, Vietnamese, Finnish, Basque, Galician, Luxembourgish, Romansh, Catalan, Quechua |
arabic | Arabic script (阿拉伯语系) | Arabic, Persian, Uyghur, Urdu, Pashto, Kurdish, Sindhi, Balochi, English |
cyrillic | Cyrillic script (西里尔语系) | Russian, Belarusian, Ukrainian, Serbian (Cyrillic), Bulgarian, Mongolian, Abkhazian, Adyghe, Kabardian, Avar, Dargin, Ingush, Chechen, Lak, Lezgin, Tabasaran, Kazakh, Kyrgyz, Tajik, Macedonian, Tatar, Chuvash, Bashkir, Malian, Moldovan, Udmurt, Komi, Ossetian, Buryat, Kalmyk, Tuvan, Sakha, Karakalpak, English |
east_slavic | East Slavic (东斯拉夫语系) | Russian, Belarusian, Ukrainian, English |
devanagari | Devanagari script (天城文语系) | Hindi, Marathi, Nepali, Bihari, Maithili, Angika, Bhojpuri, Magahi, Santali, Newari, Konkani, Sanskrit, Haryanvi, English |
Examples
mineru-open-api flash-extract report.pdf
mineru-open-api flash-extract report.pdf -o ./out/
mineru-open-api flash-extract report.pdf --language en
mineru-open-api flash-extract report.pdf --language latin
mineru-open-api flash-extract report.pdf --pages "1-5"
mineru-open-api flash-extract contract.docx -o ./out/
mineru-open-api flash-extract presentation.pptx -o ./out/
mineru-open-api flash-extract scan.jpg --language ch
Output behavior
- No
-oflag: result goes to stdout; status/progress messages go to stderr - With
-oflag: result saved to file/directory; progress messages on stderr - Markdown output includes extracted images saved alongside the
.mdfile - Tables are converted to Markdown tables
- Formulas are converted to LaTeX format (inline
$...$and block$$...$$)
Agent guidelines
When using this skill on behalf of the user:
- Always use
flash-extractfor any input — whether it's a local file or a URL (e.g.https://cdn-mineru.openxlab.org.cn/demo/example.pdf). Do NOT assume a URL means "web page".flash-extracthandles URLs to document files directly. - Quote file paths that contain spaces or special characters with double quotes. Example:
mineru-open-api flash-extract "report 01.pdf". - Don't run commands blindly on errors — explain the exit code and troubleshooting steps instead of re-running the command.
- Installation questions ("mineru 怎么安装") should be answered with the install instructions above.
Default output directory
When the user does NOT specify -o, generate a default output directory:
~/MinerU-Skill/<name>_<hash>/
<name>: derived from the source, then sanitized (replace spaces and shell-unsafe characters with_, collapse consecutive_).- For URLs: last path segment (e.g.
https://arxiv.org/pdf/2509.22186→2509.22186) - For local files: filename without extension (e.g.
report.pdf→report)
- For URLs: last path segment (e.g.
<hash>: first 6 characters of MD5 hash of the full original source.
echo -n "source" | md5sum | cut -c1-6 # Linux
echo -n "source" | md5 | cut -c1-6 # macOS
When the user specifies -o: use the user's path as-is.
Skill upgrade = CLI upgrade
When the user asks to upgrade this skill, re-install the CLI first:
npm install -g mineru-open-api@latest
Exit codes
| Code | Meaning | Recovery |
|---|---|---|
| 0 | Success | — |
| 1 | General API or unknown error | Check network; retry; use --verbose |
| 2 | Invalid parameters / usage error | Check command syntax and flag values |
| 4 | File too large or page limit exceeded | Try a smaller file or fewer pages |
| 5 | Extraction failed | Document may be corrupted or unsupported |
| 6 | Timeout | Increase with --timeout |
Troubleshooting
- Timeout on large files: Increase with
--timeout 1600 - Extraction quality is poor: Try specifying
--languageto match the document language - HTTP 429: Rate limit hit. Wait a few minutes and retry.