minimax-pdf-ocr

使用 MiniMax Vision API 识别 PDF/图片中的文字

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "minimax-pdf-ocr" with this command: npx skills add chongjie-ran/minimax-pdf-ocr

MiniMax OCR Skill

使用 MiniMax Vision API 识别 PDF/图片中的文字内容，支持中文和英文。

功能

PDF 转图片（使用 poppler）
MiniMax Vision API 文字识别
输出 Markdown 格式

依赖

# 安装 Node.js 依赖
cd minimax-pdf-ocr
npm install openai pdf2image

# 安装系统依赖
brew install poppler

使用方法

命令行

# 设置 API Key
export MINIMAX_API_KEY="your-api-key"

# 运行 OCR
node pdf-ocr-minimax.js <pdf文件路径> [输出目录]

# 示例
node pdf-ocr-minimax.js ./document.pdf ./output/

作为 Skill 使用

在 JavaScript 代码中调用：

const { recognizePdf } = require('./pdf-ocr-minimax.js');

await recognizePdf('/path/to/document.pdf', './output/');

环境变量

变量	说明	必填
MINIMAX_API_KEY	MiniMax API Key (从 platform.minimaxi.com 获取)	是
OUTPUT_DIR	输出目录	否（默认当前目录）

输出

识别结果保存为 .md 文件
包含所有页面的文字内容
保持原有格式和段落结构

示例输出

# 文档名称

## 第 1 页

这里是第一页的文字内容...

## 第 2 页

这里是第二页的文字内容...

注意事项

MiniMax M2.5 模型支持视觉理解
每页识别约消耗 100-500 次 token
建议批量处理时添加适当延迟避免限流
API Key 获取: https://platform.minimaxi.com/user-center/basic-information/interface-key

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

General

claw-text-and-pics

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no...

Registry SourceRecently Updated

860Profile unavailable

General

WPS PDF Processing

当用户需要对 PDF 文件进行任何操作时，使用本技能。包括：读取或提取 PDF 中的文字/表格、合并多个 PDF、拆分 PDF、旋转页面、添加水印、创建新 PDF、填写 PDF 表单、加密/解密 PDF、提取图片，以及对扫描版 PDF 进行 OCR 识别使其可搜索。只要用户提到 .pdf 文件或希望生成 PDF，...

Registry SourceRecently Updated

590Profile unavailable

General

MiniMax PDF

MiniMax PDF专业排版导出 - Playwright+ReportLab双引擎，HTML/CSS封面渲染，精准正文排版，输出印刷级、版式稳定的PDF文档。

Registry SourceRecently Updated

800Profile unavailable

General

Akashic Doc Analyzer

Parse, analyze, and extract content from documents (PDF, DOCX, PPTX, audio). Supports OCR, table extraction, and semantic chunking.

Registry Source

1290Profile unavailable