Pandoc Document Converter

Convert documents between any formats pandoc supports, with full control over styling, templates, table of contents, metadata, and PDF engine selection.

Quick Start

For most conversions, use the helper script at scripts/convert.sh:

bash <skill-dir>/scripts/convert.sh <input-file> <output-file> [options...]

The script auto-detects formats from file extensions and applies sensible defaults (standalone output, appropriate PDF engine, default LaTeX margins for LaTeX-based PDF engines). It also checks that pandoc, the input file, the output directory, and any requested PDF engine are available. Any extra arguments are passed through to pandoc.

How Conversions Work

Pandoc reads a source format into an internal AST, then writes it out in the target format. This means you can go from nearly any supported input to any supported output. The key decision points are:

Input format — usually auto-detected from the file extension
Output format — auto-detected from the output file extension
PDF engine — for PDF output, choose between xelatex (best Unicode/font support), lualatex (strong Unicode/fonts), tectonic (self-contained TeX), pdflatex (fastest, good for ASCII-heavy docs), or HTML/CSS engines like weasyprint, wkhtmltopdf, or prince
Styling — CSS for HTML-based outputs, LaTeX templates for PDF, reference docs for DOCX/ODT

Common Conversion Patterns

HTML → PDF

pandoc input.html -o output.pdf --pdf-engine=weasyprint -s

If the HTML uses external CSS, include it:

pandoc input.html -o output.pdf --pdf-engine=weasyprint -s --css=style.css

Markdown → PDF

pandoc input.md -o output.pdf --pdf-engine=xelatex -s --toc --toc-depth=3

Markdown → DOCX

pandoc input.md -o output.docx -s

To use a reference (template) document for styling:

pandoc input.md -o output.docx --reference-doc=template.docx

Markdown → HTML

pandoc input.md -o output.html -s --css=style.css --toc

DOCX → Markdown

pandoc input.docx -o output.md --extract-media=./media

Markdown → EPUB

pandoc input.md -o output.epub -s --toc --epub-cover-image=cover.jpg

LaTeX → PDF

pandoc input.tex -o output.pdf --pdf-engine=xelatex

CSV → HTML table

pandoc input.csv -o output.html -s

Styling and Appearance

CSS for HTML-based outputs

Create or use a CSS file and pass it with --css=path/to/style.css. For PDF output via weasyprint, wkhtmltopdf, or prince, CSS is respected directly. For PDF via LaTeX engines, CSS is usually ignored — use LaTeX variables or templates instead.

A sensible default stylesheet is provided at assets/default.css. Use it when the user wants a clean, readable output without specifying their own styles:

pandoc input.md -o output.html -s --css=<skill-dir>/assets/default.css

LaTeX variables for PDF styling

Control margins, fonts, and paper size without a full template:

pandoc input.md -o output.pdf --pdf-engine=xelatex \
  -V geometry:margin=1in \
  -V fontsize=12pt \
  -V mainfont="DejaVu Serif" \
  -V documentclass=article

Reference documents for DOCX/ODT

To match a corporate style, provide a reference document:

pandoc input.md -o output.docx --reference-doc=brand-template.docx

Advanced Features

Add --toc and optionally --toc-depth=N (default 3):

pandoc input.md -o output.pdf --pdf-engine=xelatex -s --toc --toc-depth=2

Metadata

Set title, author, date via YAML frontmatter in the source file or via -M:

pandoc input.md -o output.pdf --pdf-engine=xelatex -s \
  -M title="My Report" -M author="Jane Doe" -M date="2026-03-15"

Filters and Lua filters

Pandoc supports filters that transform the AST. Lua filters are self-contained:

pandoc input.md -o output.pdf --lua-filter=my-filter.lua

Multiple input files

Pandoc concatenates multiple inputs:

pandoc chapter1.md chapter2.md chapter3.md -o book.pdf --pdf-engine=xelatex -s --toc

Extracting media from DOCX/EPUB

pandoc input.docx -o output.md --extract-media=./media

Troubleshooting

Problem	Likely cause	Fix
PDF has missing characters	Font doesn't support the glyphs	Use `--pdf-engine=xelatex` with `-V mainfont="DejaVu Serif"`
PDF conversion fails	No compatible PDF engine installed	Check `which xelatex lualatex tectonic pdflatex weasyprint wkhtmltopdf prince` and install one that matches your output needs
DOCX looks unstyled	No reference doc	Create a styled DOCX template and pass `--reference-doc`
HTML images missing	Relative paths broken	Use `--self-contained` to embed images as base64
CSS has no effect on PDF	LaTeX PDF engine selected	Use `--pdf-engine=weasyprint`, `--pdf-engine=wkhtmltopdf`, or `--pdf-engine=prince`
Table of contents empty	No headings in source	Ensure source uses `#` headings (Markdown) or `<h1>`–`<h6>` (HTML)

Format Reference

For a full list of supported input and output formats, see references/formats.md.

Choosing the Right Approach

When a user asks to convert a document, think about:

What's the source format? Check the file extension or ask. If it's ambiguous (e.g., a .txt that's actually Markdown), specify -f markdown explicitly.
What's the target format? Map the user's intent to a file extension.
Does it need styling? If the user wants it to "look nice" or "be professional," add CSS (for HTML) or LaTeX variables (for PDF) or a reference doc (for DOCX).
Does it need structure? TOC, numbered sections, metadata — add these when the document is long or formal.
Are there images or media? Use --self-contained for HTML, --extract-media when converting from DOCX/EPUB to text formats.

Always use the helper script scripts/convert.sh as the starting point — it handles the most common gotchas automatically, picks a reasonable PDF engine, and prints recovery hints when PDF conversion fails. Add extra pandoc flags as needed for the specific use case.

pandoc

Safety Notice

Copy this and send it to your AI assistant to learn