PDF Factory

Dependencies

Run before first use:

python3 scripts/install_deps.py

Required packages: xhtml2pdf, reportlab, pypdf, pyhanko, markdown, lxml, pillow, html5lib, cssselect2, svglib, python-bidi, arabic-reshaper.

The installer uses --no-deps for svglib, rlpycairo, and xhtml2pdf to avoid building the pycairo C extension (which requires the system cairo library). If installation fails with cairo/meson errors, run manually:

uv pip install --no-deps rlpycairo svglib xhtml2pdf

Icons

Phosphor icons are fetched on demand (not bundled). To pre-download icons:

Fetch specific icons

python3 scripts/fetch_icons.py arrow-right check-circle warning

Fetch all 1,500+ icons for offline use

python3 scripts/fetch_icons.py --all

Pipeline

Generate PDFs by following these steps in order:

Resolve brand kit — Locate brand-{slug} skill, read manifest.json and zones.json
Parse markdown — Convert source to HTML via markdown library
Render content pages — Run render.py to produce styled content pages
Compose document — Run compose.py to merge content with template pages
Validate output — Run validate_output.py, fix errors, repeat until pass
Sign (optional) — Apply digital signature via pyhanko

Rendering Internals

See references/internals.md for details on font registration, zone overlays, section page breaks, orphan prevention, SVG rendering, image corner radius, chart integration, image generation tokens, CSS specifics, and token resolution. Read when debugging rendering issues or understanding pipeline behavior.

Step 1: Resolve Brand Kit

Locate the brand kit skill directory and read:

assets/manifest.json — font paths, logo paths, template paths, tokens (colors + type_scale)
assets/templates/pdf/zones.json — content zones for each template page

The manifest.json["tokens"] section is the machine-readable source of truth for all color and typography values used by the pipeline scripts.

If no brand kit is specified, use fallback assets from assets/fallback/ .

Step 2: Parse Markdown

Convert source markdown to HTML:

import markdown html = markdown.markdown(source, extensions=[ 'tables', 'fenced_code', 'codehilite', 'toc', 'meta', 'attr_list' ])

Extract document metadata from markdown frontmatter:

title , subtitle , author , date → cover page zones
sections → section divider insertion

Step 3: Render Content Pages

The --brand flag is optional. Without it, render.py uses fallback fonts and colors.

With brand kit:

python3 scripts/render.py
--brand /path/to/brand-{slug}
--input content.html
--output content-pages.pdf
--sections '["Executive Summary", "Technical Achievements"]'

Fallback (no brand kit):

python3 scripts/render.py
--input content.html
--output content-pages.pdf

The --sections flag accepts a JSON array of section titles. H1 headings matching these titles are hidden from content pages (they appear on section divider pages instead). Omit --sections when not using section dividers.

For composition rules (grid, spacing, page breaks, widows/orphans), load references/composition.md.

For element rendering details (headings, code blocks, tables, lists), load references/elements.md.

Step 4: Compose Document

The --brand flag is optional. Without it, compose.py produces content-only output (no cover pages, section dividers, or zone overlays).

With brand kit:

python3 scripts/compose.py
--brand /path/to/brand-{slug}
--content content-pages.pdf
--metadata metadata.json
--output final.pdf

Fallback (no brand kit, metadata only):

python3 scripts/compose.py
--content content-pages.pdf
--metadata metadata.json
--output final.pdf

compose.py embeds title, author, and subtitle from metadata.json into the PDF info dictionary.

Provide metadata.json with this structure:

{ "title": "Document Title", "subtitle": "Optional Subtitle", "author": "Author Name", "date": "February 9, 2026", "sections": [ { "title": "Introduction", "page": 1 }, { "title": "Analysis", "page": 5 } ] }

Step 5: Validate Output

Run automated validation:

python3 scripts/validate_output.py final.pdf --brand /path/to/brand-{slug}

If validation fails, check these common issues:

"Font not embedded" — Verify font path in manifest matches actual file in assets/fonts/
"Page count 0" — Verify render.py produced output; check input HTML is not empty
"Missing metadata" — Ensure metadata.json contains title and author fields
"Brand font missing" — Confirm all fonts declared in manifest exist on disk

Fix errors and re-run validation. Only proceed when all checks pass.

Then perform manual QA:

No H1 duplication — Section titles appear only on divider pages, not repeated on content pages (use --sections in Step 3 to prevent this)
Images contextually relevant — Each image matches its section topic; no nonsensical compositions or artifacts
Images brand-aligned — Style matches tokens.imagery guidelines (B&W for Wave Artisans, editorial for Bluewaves, reportage for Decathlon)
Charts readable — Labels don't overlap, legends are clear, data is accurate, no clipping
File size — Total PDF under 50MB; if over, regenerate images with JPEG format and 2K resolution
Page flow — No orphaned headings at page bottoms, no widowed single lines at page tops
Cover/divider zones — Title, subtitle, date, and author render correctly in their zones

Step 6: Sign (Optional)

Apply digital signature via pyhanko when required:

from pyhanko.sign import signers from pyhanko.keys import load_cert_from_pemder

Requires a PKCS#12 certificate file (.p12/.pfx)

Example

Input markdown:

title: Q4 Performance Review subtitle: Engineering Division author: Jane Smith date: January 2026

Executive Summary

Revenue grew 23% year-over-year...

Technical Achievements

Infrastructure Scaling

We migrated 40 services to the new platform...

Expected output (using brand-bluewaves):

Cover page: cover-front.pdf template with "Q4 Performance Review" in title zone (display style, text-inverse), "Engineering Division" in subtitle zone, "JANUARY 2026" in date zone, Bluewaves logo in logo zone
Content pages: Body text on page-content.pdf template, heading font for h1/h2, body font for paragraphs, document title in header zone, page numbers in footer zone
Section dividers: section-divider.pdf template inserted before "Executive Summary" and "Technical Achievements" with section titles in zones
Back cover: cover-back.pdf template as final page

Fallback Mode

When no brand kit is specified, the pipeline uses assets/fallback/ assets:

Colors: text #1A1A1A , headings #1A1A1A , muted #7A7A7A
Fonts: Noto Sans (body/heading), Fira Code (code) — production fonts, not placeholders
Layout: A4, 25mm margins, 11pt body
No cover pages or section dividers in fallback mode

pdf-factory

Safety Notice

Copy this and send it to your AI assistant to learn

Fetch specific icons

Fetch all 1,500+ icons for offline use

With brand kit:

Fallback (no brand kit):

With brand kit:

Fallback (no brand kit, metadata only):

Requires a PKCS#12 certificate file (.p12/.pfx)

title: Q4 Performance Review subtitle: Engineering Division author: Jane Smith date: January 2026

Executive Summary

Technical Achievements

Infrastructure Scaling

Source Transparency

Related Skills

photographer-testino

photographer-lindbergh

photographer-lachapelle

photographer-vonunwerth