document-parser

# document-parser

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "document-parser" with this command: npx skills add ankylala/document-parser

document-parser

高精度文档解析技能,从 PDF、图片、Word 文档中提取结构化数据。

用途

  • 解析 PDF、图片 (JPG/PNG)、Word 文档
  • 版面分析与结构提取
  • 表格识别(输出 HTML/Markdown)
  • OCR 文字识别
  • 印章检测
  • 目录提取

命令

解析文档

document-parser parse <文件路径> [选项]

示例:

document-parser parse C:\docs\report.pdf
document-parser parse C:\docs\scan.jpg --layout --table
document-parser parse C:\docs\contract.docx --output markdown

查询任务状态

document-parser status <任务 ID>

参数说明

参数说明示例
文件路径PDF/图片/Word 文件路径C:\docs\report.pdf
--layout启用版面分析--layout
--table启用表格识别--table
--seal启用印章检测--seal
--output输出格式 (json/markdown/both)--output markdown
--pages页码范围--pages 1-5,8,10-12

配置

方式一:环境变量

DOCUMENT_PARSER_API_KEY=your_api_key
DOCUMENT_PARSER_BASE_URL=http://47.111.146.164:8088/taidp/v1/idp/general_parse

方式二:配置文件

在技能目录创建 config.json

{
  "api_key": "your_api_key",
  "base_url": "http://47.111.146.164:8088/taidp/v1/idp/general_parse"
}

输出格式

返回结构化 JSON 包含:

  • pages: 解析后的页面数组
  • elements: 版面元素(文本、表格、图片等)
  • markdown: Markdown 格式文本
  • data: 数据统计摘要

依赖

  • requests
  • python-docx (Word 支持)
  • Pillow (图片处理)

错误码

错误码消息说明
10000Success识别成功
10001Missing parameter参数缺失
10002Invalid parameter非法参数
10003Invalid file文件格式非法
10004Failed to recognize识别失败
10005Internal error内部错误

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Arxiv Reader

Read and analyze arXiv papers by fetching LaTeX source, listing sections, or extracting abstracts

Registry SourceRecently Updated
067
Profile unavailable
Research

Ai Task Hub

AI task hub for image analysis, background removal, speech-to-text, text-to-speech, markdown conversion, points balance/ledger lookup, and async execute/poll...

Registry SourceRecently Updated
1205
Profile unavailable
Research

Hugging Face Papers

Browse trending papers, search by keyword, and get paper details from Hugging Face Papers

Registry SourceRecently Updated
013
Profile unavailable