MarkItDown
Purpose
Convert a wide variety of file formats into Markdown text using Microsoft's markitdown CLI. Useful for extracting text from documents for LLM analysis, summarization, or ingestion into knowledge bases.
Supported Formats
Category Formats
Documents PDF, DOCX, PPTX, XLSX, XLS
Web & Data HTML, CSV, JSON, XML
Media Images (EXIF + OCR), Audio (metadata + transcription)
eBooks EPub
Archives ZIP (iterates over contents)
Other YouTube URLs, Outlook messages
Basic Usage
Convert a file (output to stdout)
markitdown path/to/file.pdf
Save output to a file
markitdown path/to/file.pdf -o output.md
Pipe from stdin
cat path/to/file.pdf | markitdown
Options
Flag Description
-o <file>
Write output to a file instead of stdout
-d
Use Azure Document Intelligence for conversion
-e "<endpoint>"
Azure Document Intelligence endpoint URL
--use-plugins
Enable third-party plugins
--list-plugins
Show installed plugins
Workflow
Single File Conversion
Convert and capture the result
result=$(markitdown document.pdf)
Convert and save
markitdown document.pdf -o document.md
Batch Conversion
Convert all PDFs in a directory
for f in *.pdf; do markitdown "$f" -o "${f%.pdf}.md" done
Pipe into Other Tools
Convert and count words
markitdown document.pdf | wc -w
Convert and search for a term
markitdown document.pdf | grep -i "search term"
Agent Usage Notes
-
Output goes to stdout by default. Capture it in a variable or redirect to a file.
-
For large files, prefer saving to a file with -o rather than capturing stdout.
-
Image conversion extracts EXIF metadata and OCR text. For richer image descriptions, use the Python API with an LLM client instead.
-
ZIP files are automatically extracted and each contained file is converted.
-
If conversion fails for a format, check that the corresponding optional dependency is installed (e.g., markitdown[pdf] for PDF support).