MarkItDown

Purpose

Convert a wide variety of file formats into Markdown text using Microsoft's markitdown CLI. Useful for extracting text from documents for LLM analysis, summarization, or ingestion into knowledge bases.

Supported Formats

Category Formats

Documents PDF, DOCX, PPTX, XLSX, XLS

Web & Data HTML, CSV, JSON, XML

Media Images (EXIF + OCR), Audio (metadata + transcription)

eBooks EPub

Archives ZIP (iterates over contents)

Other YouTube URLs, Outlook messages

Basic Usage

Convert a file (output to stdout)

markitdown path/to/file.pdf

Save output to a file

markitdown path/to/file.pdf -o output.md

Pipe from stdin

cat path/to/file.pdf | markitdown

Options

Flag Description

-o <file>

Write output to a file instead of stdout

-d

Use Azure Document Intelligence for conversion

-e "<endpoint>"

Azure Document Intelligence endpoint URL

--use-plugins

Enable third-party plugins

--list-plugins

Show installed plugins

Workflow

Single File Conversion

Convert and capture the result

result=$(markitdown document.pdf)

Convert and save

markitdown document.pdf -o document.md

Batch Conversion

Convert all PDFs in a directory

for f in *.pdf; do markitdown "$f" -o "${f%.pdf}.md" done

Pipe into Other Tools

Convert and count words

markitdown document.pdf | wc -w

Convert and search for a term

markitdown document.pdf | grep -i "search term"

Agent Usage Notes

Output goes to stdout by default. Capture it in a variable or redirect to a file.
For large files, prefer saving to a file with -o rather than capturing stdout.
Image conversion extracts EXIF metadata and OCR text. For richer image descriptions, use the Python API with an LLM client instead.
ZIP files are automatically extracted and each contained file is converted.
If conversion fails for a format, check that the corresponding optional dependency is installed (e.g., markitdown[pdf] for PDF support).

markitdown

Safety Notice

Copy this and send it to your AI assistant to learn

Convert a file (output to stdout)

Save output to a file

Pipe from stdin

Convert and capture the result

Convert and save

Convert all PDFs in a directory

Convert and count words

Convert and search for a term

Source Transparency

Related Skills

markitdown

markitdown

markitdown