mistral-pdf-to-markdown

Mistral PDF to Markdown Converter

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mistral-pdf-to-markdown" with this command: npx skills add fuzhiyu/researchprojecttemplate/fuzhiyu-researchprojecttemplate-mistral-pdf-to-markdown

Mistral PDF to Markdown Converter

Convert PDF documents to Markdown format using Mistral's OCR API. Automatically extracts text, formatting, and images.

When to Use

  • Converting research papers or documents to Markdown

  • Extracting text from scanned PDFs (OCR capability)

  • Preserving document structure with headers and formatting

  • Extracting embedded images from PDFs

Quick Start

Use the conversion script from this skill's directory:

Convert entire PDF

python scripts/convert_pdf_to_markdown.py input.pdf output.md

Convert specific pages

python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5" python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"

Output Structure

Output/PDFConversions/ ├── document.md # Markdown with text and image references └── images/ ├── img-0.jpeg # Extracted images ├── img-1.jpeg └── ...

Usage in Code

from pathlib import Path import subprocess

Run conversion script

result = subprocess.run([ "python", ".claude/skills/mistral-pdf-to-markdown/scripts/convert_pdf_to_markdown.py", "input.pdf", "Output/PDFConversions/output.md", "--pages", "1-10" ], capture_output=True, text=True)

print(result.stdout)

Key Features

  • Markdown formatting: Preserves headers, lists, and structure

  • Image extraction: Saves images to images/ subfolder automatically

  • Page selection: Extract specific pages or ranges

  • Scanned PDF support: True OCR capability for image-based PDFs

  • Relative paths: Image references use ...

Requirements

The script requires:

  • Mistral API key in Notes/.env (line 2: mistral_api_key=... )

  • Python packages: mistralai , python-dotenv , pypdf

Common Use Cases

Convert Research Paper

python scripts/convert_pdf_to_markdown.py
"Data/papers/research.pdf"
"Notes/Paper Markdown/research.md"

Extract Specific Sections

Extract pages 10-20 (introduction and methods)

python scripts/convert_pdf_to_markdown.py
"paper.pdf"
"Notes/Paper Markdown/intro_methods.md"
--pages "10-20"

Extract Figures Only

Extract pages with figures

python scripts/convert_pdf_to_markdown.py
"paper.pdf"
"Notes/Paper Markdown/figures.md"
--pages "25,27,30,35"

Error Handling

API Key Not Found:

Error: Mistral API key not found in Notes/.env

→ Add mistral_api_key=YOUR_KEY to line 2 of Notes/.env

Page Out of Range:

Warning: Page 100 out of range, skipping

→ Check PDF page count and adjust page selection

API Rate Limit: → Wait a moment and retry, or reduce page count per request

Notes

  • Images are saved as JPEG files in images/ subfolder

  • Markdown image references are automatically updated to images/img-X.jpeg

  • Large PDFs may take longer to process due to API limits

  • For simple text extraction without OCR, consider using the pdf skill instead

  • Scanned PDFs benefit most from this skill's OCR capability

See Also

  • pdf skill - For local PDF manipulation without API calls

  • reference.md

  • Additional details about the Mistral OCR API

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

zotero-paper-reader

No summary provided by upstream source.

Repository SourceNeeds Review
Research

work-summary

No summary provided by upstream source.

Repository SourceNeeds Review
Research

pdf

No summary provided by upstream source.

Repository SourceNeeds Review
Research

learn-anything-in-one-hour

Teach users any new skill/knowledge X in ~1 hour using a fixed 4-step workflow optimized for complete beginners, focusing on 80/20 rule for maximum value in minimum time. Triggers when user asks to learn something new quickly, or mentions "learn X in one hour".

Archived SourceRecently Updated