pandoc

Convert documents between formats using pandoc. Supports HTML, Markdown, DOCX, PDF, EPUB, LaTeX, ODT, RST, Org, MediaWiki, JIRA, CSV, Jupyter notebooks, and many more — any direction pandoc supports. Use this skill whenever the user wants to convert, transform, or export a document from one format to another, even if they don't mention pandoc explicitly. Triggers include: "convert this to PDF", "make a Word doc from this markdown", "export as EPUB", "turn this HTML into a PDF", "transform", "generate PDF", "render to", or any request involving document format conversion. Also use when the user wants to apply custom styling, add a table of contents, use a template, or set metadata during conversion.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pandoc" with this command: npx skills add Oliver Herklotz (oliver-hrkltz)/pandoc

Pandoc Document Converter

Convert documents between any formats pandoc supports, with full control over styling, templates, table of contents, metadata, and PDF engine selection.

Quick Start

For most conversions, use the helper script at scripts/convert.sh:

bash <skill-dir>/scripts/convert.sh <input-file> <output-file> [options...]

The script auto-detects formats from file extensions and applies sensible defaults (standalone output, appropriate PDF engine, default LaTeX margins for LaTeX-based PDF engines). It also checks that pandoc, the input file, the output directory, and any requested PDF engine are available. Any extra arguments are passed through to pandoc.

How Conversions Work

Pandoc reads a source format into an internal AST, then writes it out in the target format. This means you can go from nearly any supported input to any supported output. The key decision points are:

  1. Input format — usually auto-detected from the file extension
  2. Output format — auto-detected from the output file extension
  3. PDF engine — for PDF output, choose between xelatex (best Unicode/font support), lualatex (strong Unicode/fonts), tectonic (self-contained TeX), pdflatex (fastest, good for ASCII-heavy docs), or HTML/CSS engines like weasyprint, wkhtmltopdf, or prince
  4. Styling — CSS for HTML-based outputs, LaTeX templates for PDF, reference docs for DOCX/ODT

Common Conversion Patterns

HTML → PDF

pandoc input.html -o output.pdf --pdf-engine=weasyprint -s

If the HTML uses external CSS, include it:

pandoc input.html -o output.pdf --pdf-engine=weasyprint -s --css=style.css

Markdown → PDF

pandoc input.md -o output.pdf --pdf-engine=xelatex -s --toc --toc-depth=3

Markdown → DOCX

pandoc input.md -o output.docx -s

To use a reference (template) document for styling:

pandoc input.md -o output.docx --reference-doc=template.docx

Markdown → HTML

pandoc input.md -o output.html -s --css=style.css --toc

DOCX → Markdown

pandoc input.docx -o output.md --extract-media=./media

Markdown → EPUB

pandoc input.md -o output.epub -s --toc --epub-cover-image=cover.jpg

LaTeX → PDF

pandoc input.tex -o output.pdf --pdf-engine=xelatex

CSV → HTML table

pandoc input.csv -o output.html -s

Styling and Appearance

CSS for HTML-based outputs

Create or use a CSS file and pass it with --css=path/to/style.css. For PDF output via weasyprint, wkhtmltopdf, or prince, CSS is respected directly. For PDF via LaTeX engines, CSS is usually ignored — use LaTeX variables or templates instead.

A sensible default stylesheet is provided at assets/default.css. Use it when the user wants a clean, readable output without specifying their own styles:

pandoc input.md -o output.html -s --css=<skill-dir>/assets/default.css

LaTeX variables for PDF styling

Control margins, fonts, and paper size without a full template:

pandoc input.md -o output.pdf --pdf-engine=xelatex \
  -V geometry:margin=1in \
  -V fontsize=12pt \
  -V mainfont="DejaVu Serif" \
  -V documentclass=article

Reference documents for DOCX/ODT

To match a corporate style, provide a reference document:

pandoc input.md -o output.docx --reference-doc=brand-template.docx

Advanced Features

Table of Contents

Add --toc and optionally --toc-depth=N (default 3):

pandoc input.md -o output.pdf --pdf-engine=xelatex -s --toc --toc-depth=2

Metadata

Set title, author, date via YAML frontmatter in the source file or via -M:

pandoc input.md -o output.pdf --pdf-engine=xelatex -s \
  -M title="My Report" -M author="Jane Doe" -M date="2026-03-15"

Filters and Lua filters

Pandoc supports filters that transform the AST. Lua filters are self-contained:

pandoc input.md -o output.pdf --lua-filter=my-filter.lua

Multiple input files

Pandoc concatenates multiple inputs:

pandoc chapter1.md chapter2.md chapter3.md -o book.pdf --pdf-engine=xelatex -s --toc

Extracting media from DOCX/EPUB

pandoc input.docx -o output.md --extract-media=./media

Troubleshooting

ProblemLikely causeFix
PDF has missing charactersFont doesn't support the glyphsUse --pdf-engine=xelatex with -V mainfont="DejaVu Serif"
PDF conversion failsNo compatible PDF engine installedCheck which xelatex lualatex tectonic pdflatex weasyprint wkhtmltopdf prince and install one that matches your output needs
DOCX looks unstyledNo reference docCreate a styled DOCX template and pass --reference-doc
HTML images missingRelative paths brokenUse --self-contained to embed images as base64
CSS has no effect on PDFLaTeX PDF engine selectedUse --pdf-engine=weasyprint, --pdf-engine=wkhtmltopdf, or --pdf-engine=prince
Table of contents emptyNo headings in sourceEnsure source uses # headings (Markdown) or <h1><h6> (HTML)

Format Reference

For a full list of supported input and output formats, see references/formats.md.

Choosing the Right Approach

When a user asks to convert a document, think about:

  1. What's the source format? Check the file extension or ask. If it's ambiguous (e.g., a .txt that's actually Markdown), specify -f markdown explicitly.
  2. What's the target format? Map the user's intent to a file extension.
  3. Does it need styling? If the user wants it to "look nice" or "be professional," add CSS (for HTML) or LaTeX variables (for PDF) or a reference doc (for DOCX).
  4. Does it need structure? TOC, numbered sections, metadata — add these when the document is long or formal.
  5. Are there images or media? Use --self-contained for HTML, --extract-media when converting from DOCX/EPUB to text formats.

Always use the helper script scripts/convert.sh as the starting point — it handles the most common gotchas automatically, picks a reasonable PDF engine, and prints recovery hints when PDF conversion fails. Add extra pandoc flags as needed for the specific use case.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Content Refresher

Use when updating outdated content, fixing traffic/ranking decay, refreshing stats, adding new sections, or improving freshness signals. 内容更新/排名恢复

Registry SourceRecently Updated
General

AssemblyAI Transcriber

Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key.

Registry SourceRecently Updated
General

mac-node-snapshot

A robust, permission-friendly method to capture macOS screens via OpenClaw screen.record. Ideal for headless environments or ensuring capture reliability.

Registry SourceRecently Updated
1.4K0taozhe6
General

Amazon Asin Lookup Api Skill

This skill helps users extract structured product details from Amazon using a specific ASIN (Amazon Standard Identification Number). Use this skill when the...

Registry SourceRecently Updated
1.3K1phheng