kai-docx-generator
Generate professional .docx documents from Markdown, or fill data into .docx templates.
Core Principles
- Zero Fabrication — Never invent content. If data is missing, use placeholder text (
[待填写]) or ask the user. - Template Filling is XML-level —
{{variable_name}}patterns are matched at thew:tXML node level inside the .docx. Values are sanitized to escape& < > " 'and remove control characters. - Security: Path Validation — Image paths must be relative or within the current working directory. Absolute paths (e.g.
/tmp/...) and path traversal (..) are rejected. - Auto-Truncation — Exceeding input limits (200KB, 20 images, 50 table rows, 20 table columns) triggers automatic truncation without error.
- Template Strictness —
--strictmode onfill_template.pyreturns exit code 1 when any placeholder remains unreplaced, enabling CI/CD gate usage.
Commands
| Command | What it does |
|---|---|
python3 <skill-path>/scripts/generate.py <input.md> <output.docx> | Convert Markdown to .docx |
python3 <skill-path>/scripts/generate.py --style <style> <input.md> <output.docx> | Convert with a specific style |
| `python3 <skill-path>/scripts/fill_template.py --data <data.json | frontmatter.md> --output <out.docx> <template.docx>` |
python3 <skill-path>/scripts/fill_template.py --strict --data <data.json> --output <out.docx> <template.docx> | Fill template, fail if placeholders remain |
python3 <skill-path>/scripts/validate.py <file.docx> | Validate .docx file structure |
Markdown → .docx
python3 <skill-path>/scripts/generate.py input.md output.docx
python3 <skill-path>/scripts/generate.py --style report input.md output.docx
Content-Type → Style Routing
When no --style is specified, recommend a style based on topic keywords:
| Topic keywords | Recommended style | Use case |
|---|---|---|
| 合同、协议、条款、contract, agreement | contract | Contracts & legal documents |
| 报告、分析、数据、KPI、report, analysis, dashboard | report | Business reports (blue headers) |
| 会议纪要、meeting minutes, 讨论 | meeting | Meeting minutes (compact layout) |
| 红头文件、通知、通报、dispatch, official notice | dispatch | Official dispatchs (red header) |
| 技术、API、架构、代码、spec, architecture | tech_spec | Technical specs (monospace body) |
| 公函、商务信、letter | business_letter | Business letters |
| 公告、公告通知、notice, announcement | notice | Official notices (large bold text) |
| 其他 / general | standard | General documents |
When routing, announce: "推荐使用 [style] 样式 ([description]),可用 --style 覆盖。"
Default Output Filename
If the user does not specify an output filename, derive it from the input file:
<input>.md→<input>.docx(same basename, replace extension)- For generated content (no input file), use:
doc-<YYYY-MM-DD>-<slug>.docx - Slug rule: lowercase the title/topic, replace spaces and non-ASCII with hyphens, keep only alphanumeric ASCII and hyphens, max 30 chars.
Supported Markdown Features
| Feature | Syntax |
|---|---|
| Headings (H1-H3) | #, ##, ### |
| Paragraphs | Text blocks |
| Bold / Italic | **bold**, *italic* |
| Bullet / Numbered lists | -, 1. |
| Tables | Standard GFM tables |
| Code blocks | Triple backticks, with optional language |
| Blockquotes (callout) | > text → info box |
| Images |  |
| Page break | <!-- pagebreak --> |
| Horizontal rule | --- |
| YAML front matter | ---\ntitle: ...\n--- |
Style Presets
| Style | Use case |
|---|---|
standard (default) | General documents |
contract | Contracts & agreements (formal, 12pt) |
report | Business reports (blue headers, 10.5pt) |
meeting | Meeting minutes (compact, 10.5pt) |
business_letter | Business letters (traditional format) |
tech_spec | Technical specifications (monospace, dark theme) |
dispatch | Official dispatchs / 红头文件 (red header) |
notice | Official notices (bold, 14pt) |
YAML Front Matter
---
title: Document Title # 写入 docx core_properties.title
author: Author Name # 写入 docx core_properties.author
date: 2026-04-10 # 写入 docx core_properties.created
lang: zh # 可选: zh | en, 影响默认文本
header:
text: CONFIDENTIAL # 页眉文字
alignment: center # left | center | right
footer:
page_number: true # 显示页码
toc:
max_level: 3 # TOC 包含的最大标题级别 (1-3)
---
All fields are optional. title, author, and date are written to the .docx core properties (visible in Word → File → Info).
Template Filling
# With JSON data
python3 <skill-path>/scripts/fill_template.py --data data.json --output output.docx template.docx
# With Markdown front matter as data
python3 <skill-path>/scripts/fill_template.py --data frontmatter.md --output output.docx template.docx
# Strict mode: fail if any placeholder remains unreplaced
python3 <skill-path>/scripts/fill_template.py --strict --data data.json --output output.docx template.docx
Data format (JSON):
{
"company_name": "Acme Corp",
"amount": "¥1,000,000",
"date": "2026-04-10"
}
Template placeholders are {{variable_name}} patterns in the .docx text (matched at w:t XML node level). Values are automatically sanitized to escape XML special characters and remove control characters. Long values are truncated to prevent layout breakage.
--strict Mode
When --strict is passed, the script exits with code 1 if any {{placeholder}} remains unreplaced after filling. This enables CI/CD gating: all placeholders must have corresponding data keys.
Input Limits
- Markdown size: 200KB max
- Images: 20 max
- Table rows: 50 max
- Table columns: 20 max
Exceeding limits triggers automatic truncation with no error.
QA Process
After generating a .docx, run these checks:
Automated checks (run validate.py on the output):
- File is valid ZIP with required structure (
word/document.xml,[Content_Types].xml,word/_rels/document.xml.rels) - XML is parseable (no malformed tags)
- File size is reasonable (> 50KB warning, > 50MB warning)
- No unreplaced
{{placeholders}}detected
Visual checks (open in Word/WPS):
5. Style conformance — headings use the correct style preset colors and fonts (see references/style-presets.md for exact values)
6. Spacing — callouts have proper spacing from surrounding elements
7. Tables — column widths are reasonable, no emoji overlap
8. Code blocks — monospace font + gray background rendering
9. TOC — if requested, TOC field code is present (requires Word to update fields)
10. Images — embedded images render at correct size, no broken links
Template filling checks:
11. All {{placeholders}} with matching data keys are replaced
12. No XML corruption — the output opens without error
13. Long values are truncated appropriately (no layout breakage)
Use --validate flag with generate.py to run automated checks automatically after generation.
Works with
- kai-report-creator — generate .docx reports from markdown output
- Any Markdown content (reports, specs, meeting notes, contracts)
- Existing .docx templates with
{{placeholder}}variables