doc-extract

Extract text and content from Word documents (.doc, .docx) to Markdown using MinerU. A straightforward tool for reading and extracting Word file content. Features: fast text extraction from .docx with no token required (flash-extract). Full extraction for both .doc and .docx with token. Preserves basic formatting and structure. Page range selection for large documents. Use when you need to: extract text from a Word file, read content from .doc or .docx, pull text out of a Word document, get the content of a Word file as Markdown. Use when asked: 'how do I extract text from Word', 'read this docx file', 'I want the text from this Word document', 'can my agent read Word files', 'is there a skill that extracts Word content'. Built on MinerU by OpenDataLab (Shanghai AI Lab), an open-source document intelligence engine. Handles multilingual content. Works with local files and URLs. Great for developers, assistants, and automation workflows that need to quickly extract and process Word document content.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "doc-extract" with this command: npx skills add mzlzyca/doc-extract

Doc Extract

Extract text and content from Word (.doc/.docx) files to Markdown using MinerU.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Quick extraction from .docx (no token required)
mineru-open-api flash-extract report.docx

# Save to directory
mineru-open-api flash-extract report.docx -o ./out/

# Extract .doc file (requires token)
mineru-open-api extract report.doc -o ./out/

# Extract with language hint
mineru-open-api extract report.docx --language en -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc and extract:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, max 10 MB / 20 pages) and extract
  • .doc: requires extract with token
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (e.g. 1-10)

Notes

  • .doc requires extract with token; .docx works with flash-extract for quick extraction
  • Output goes to stdout by default; use -o <dir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Baoyu Danger Gemini Web

Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input,...

Registry SourceRecently Updated
8310wjctim
General

Easypost

EasyPost — shipping labels, rate comparison, package tracking, address verification, and insurance.

Registry SourceRecently Updated
General

Update Advisor

OpenClaw update check and upgrade assistant. Triggers on phrases like "check for updates", "any new version", "is openclaw updated", "run the update", "confi...

Registry SourceRecently Updated
General

Memory Management

Manage and standardize trading decision records, extract lessons, and support history retrieval and comparison within the PAI trading system.

Registry SourceRecently Updated