Mistral PDF to Markdown Converter

Convert PDF documents to Markdown format using Mistral's OCR API. Automatically extracts text, formatting, and images.

When to Use

Converting research papers or documents to Markdown
Extracting text from scanned PDFs (OCR capability)
Preserving document structure with headers and formatting
Extracting embedded images from PDFs

Quick Start

Use the conversion script from this skill's directory:

Convert entire PDF

python scripts/convert_pdf_to_markdown.py input.pdf output.md

Convert specific pages

python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5" python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"

Output Structure

Output/PDFConversions/ ├── document.md # Markdown with text and image references └── images/ ├── img-0.jpeg # Extracted images ├── img-1.jpeg └── ...

Usage in Code

from pathlib import Path import subprocess

Run conversion script

result = subprocess.run([ "python", ".claude/skills/mistral-pdf-to-markdown/scripts/convert_pdf_to_markdown.py", "input.pdf", "Output/PDFConversions/output.md", "--pages", "1-10" ], capture_output=True, text=True)

print(result.stdout)

Key Features

Markdown formatting: Preserves headers, lists, and structure
Image extraction: Saves images to images/ subfolder automatically
Page selection: Extract specific pages or ranges
Scanned PDF support: True OCR capability for image-based PDFs
Relative paths: Image references use

Requirements

The script requires:

Mistral API key in Notes/.env (line 2: mistral_api_key=... )
Python packages: mistralai , python-dotenv , pypdf

Common Use Cases

Convert Research Paper

python scripts/convert_pdf_to_markdown.py
"Data/papers/research.pdf"
"Notes/Paper Markdown/research.md"

Extract Specific Sections

Extract pages 10-20 (introduction and methods)

python scripts/convert_pdf_to_markdown.py
"paper.pdf"
"Notes/Paper Markdown/intro_methods.md"
--pages "10-20"

Extract Figures Only

Extract pages with figures

python scripts/convert_pdf_to_markdown.py
"paper.pdf"
"Notes/Paper Markdown/figures.md"
--pages "25,27,30,35"

Error Handling

API Key Not Found:

Error: Mistral API key not found in Notes/.env

→ Add mistral_api_key=YOUR_KEY to line 2 of Notes/.env

Page Out of Range:

Warning: Page 100 out of range, skipping

→ Check PDF page count and adjust page selection

API Rate Limit: → Wait a moment and retry, or reduce page count per request

Notes

Images are saved as JPEG files in images/ subfolder
Markdown image references are automatically updated to images/img-X.jpeg
Large PDFs may take longer to process due to API limits
For simple text extraction without OCR, consider using the pdf skill instead
Scanned PDFs benefit most from this skill's OCR capability

mistral-pdf-to-markdown

Safety Notice

Copy this and send it to your AI assistant to learn

Convert entire PDF

Convert specific pages

Run conversion script

Extract pages 10-20 (introduction and methods)

Extract pages with figures

Source Transparency

Related Skills

zotero-paper-reader

work-summary

pdf

learn-anything-in-one-hour