DOCX Skill

Read, modify, and create Microsoft Word documents with support for converting markdown content into professionally formatted documents using templates.

Capabilities

Read Documents: Extract text, tables, and metadata from DOCX files
Create Documents: Generate new DOCX files from scratch or templates
Modify Documents: Edit existing documents, add content, modify styles
Markdown Conversion: Convert markdown files to DOCX using template styling
Template-Based Generation: Use DOCX templates to maintain consistent formatting

Quick Start

from docx import Document

Read a document

doc = Document('report.docx') for para in doc.paragraphs: print(para.text)

Create a document

doc = Document() doc.add_heading('Report Title', 0) doc.add_paragraph('Content here.') doc.save('output.docx')

Usage

Reading Documents

Extract content from existing DOCX files.

Input: Path to a DOCX file

Process:

Open document with python-docx
Iterate through paragraphs, tables, or other elements
Extract text, styles, or metadata

Example:

from docx import Document

doc = Document('input.docx')

Extract all text

full_text = [] for para in doc.paragraphs: full_text.append(para.text) text = '\n'.join(full_text)

Extract tables

for table in doc.tables: for row in table.rows: row_data = [cell.text for cell in row.cells] print(row_data)

Get document properties

props = doc.core_properties print(f"Title: {props.title}") print(f"Author: {props.author}")

Creating New Documents

Generate DOCX files from scratch.

Input: Content to include (text, tables, images)

Process:

Create new Document object
Add headings, paragraphs, tables
Apply styles as needed
Save to file

Example:

from docx import Document from docx.shared import Inches, Pt from docx.enum.text import WD_ALIGN_PARAGRAPH

doc = Document()

Add title

title = doc.add_heading('Security Assessment Report', 0) title.alignment = WD_ALIGN_PARAGRAPH.CENTER

Add metadata paragraph

doc.add_paragraph('Prepared by: Security Team') doc.add_paragraph('Date: January 2024')

Add section heading

doc.add_heading('Executive Summary', level=1)

Add content paragraph

para = doc.add_paragraph() para.add_run('This assessment identified ').bold = False para.add_run('3 critical findings').bold = True para.add_run(' requiring immediate attention.')

Add a table

table = doc.add_table(rows=1, cols=3) table.style = 'Table Grid' header = table.rows[0].cells header[0].text = 'Finding' header[1].text = 'Severity' header[2].text = 'Status'

Add data rows

data = [ ('SQL Injection', 'Critical', 'Open'), ('XSS Vulnerability', 'High', 'In Progress'), ] for finding, severity, status in data: row = table.add_row().cells row[0].text = finding row[1].text = severity row[2].text = status

doc.save('report.docx')

Modifying Existing Documents

Edit and update existing DOCX files.

Input: Path to existing DOCX file

Process:

Open existing document
Locate content to modify
Make changes
Save (same or new file)

Example:

from docx import Document

doc = Document('template.docx')

Replace placeholder text

for para in doc.paragraphs: if '{{CLIENT_NAME}}' in para.text: para.text = para.text.replace('{{CLIENT_NAME}}', 'Acme Corp') if '{{DATE}}' in para.text: para.text = para.text.replace('{{DATE}}', '2024-01-15')

Add new section at end

doc.add_heading('Additional Findings', level=1) doc.add_paragraph('New content added during modification.')

doc.save('modified_report.docx')

Markdown to DOCX Conversion

Convert markdown files to Word documents using template styling.

Input:

Markdown file path
DOCX template file path (optional)
Output file path

Process:

Parse markdown content
Load template document for styles (if provided)
Map markdown elements to Word styles
Generate formatted DOCX

Using the conversion script:

python scripts/md_to_docx.py report.md --template company_template.docx --output final_report.docx

Programmatic usage:

from scripts.md_to_docx import MarkdownToDocx

converter = MarkdownToDocx(template_path='template.docx') converter.convert('input.md', 'output.docx')

Style Mapping:

Markdown Word Style

Heading 1

Heading 2

Heading 3

bold

Bold run

italic

Italic run

code

Code character style

item

List Bullet

item

List Number

Code blocks Code block style

quote

Quote style

Tables Table Grid

Template Requirements:

For best results, your template should define these styles:

Heading 1, Heading 2, Heading 3
Normal (body text)
List Bullet, List Number
Quote (for blockquotes)
Code (character style for inline code)

Working with Styles

Apply consistent formatting using document styles.

Example:

from docx import Document from docx.shared import Pt, RGBColor from docx.enum.style import WD_STYLE_TYPE

doc = Document()

Create a custom style

styles = doc.styles style = styles.add_style('Finding Critical', WD_STYLE_TYPE.PARAGRAPH) style.font.bold = True style.font.size = Pt(12) style.font.color.rgb = RGBColor(255, 0, 0)

Use the style

doc.add_paragraph('CRITICAL: SQL Injection Found', style='Finding Critical')

doc.save('styled_report.docx')

Adding Images

Insert images into documents.

Example:

from docx import Document from docx.shared import Inches

doc = Document() doc.add_heading('Network Diagram', level=1) doc.add_picture('network_diagram.png', width=Inches(6)) doc.add_paragraph('Figure 1: Current network architecture')

doc.save('report_with_images.docx')

Configuration

Environment Variables

Variable Description Required Default

DOCX_TEMPLATE_DIR

Default template directory No ./assets/templates

Script Options

Option Type Description

--template

path DOCX template for styling

--output

path Output file path

--toc

flag Generate table of contents

--verbose

flag Enable verbose logging

Examples

Example 1: Security Report from Markdown

Scenario: Convert a penetration test report written in markdown to a professional Word document.

Input (report.md ):

Penetration Test Report

Executive Summary

The assessment identified 3 critical and 5 high severity findings.

Findings

Finding 1: SQL Injection

Severity: Critical

The application is vulnerable to SQL injection in the login form.

Parameter	Payload	Result
username	`' OR 1=1--`	Auth bypass

Remediation

Use parameterized queries
Implement input validation
Apply least privilege

Command:

python scripts/md_to_docx.py report.md --template corporate_template.docx --output pentest_report.docx

Output: Professional DOCX with corporate styling applied.

Example 2: Batch Document Generation

Scenario: Generate multiple reports from a template with different data.

from docx import Document import json

Load client data

with open('clients.json') as f: clients = json.load(f)

for client in clients: doc = Document('assessment_template.docx')

# Replace placeholders
for para in doc.paragraphs:
    for key, value in client.items():
        placeholder = f'{{{{{key}}}}}'
        if placeholder in para.text:
            para.text = para.text.replace(placeholder, str(value))

doc.save(f"reports/{client['name']}_assessment.docx")

Limitations

Complex formatting: Some advanced Word features (SmartArt, embedded objects) may not be fully supported
Track changes: Limited support for revision tracking
Macros: VBA macros are not supported for security reasons
Large tables: Tables with merged cells may have rendering inconsistencies

Troubleshooting

Style Not Applied

Problem: Custom styles from template not appearing in output

Solution: Ensure the style exists in the template and use exact style name:

Check available styles

for style in doc.styles: print(style.name)

Encoding Issues

Problem: Special characters not displaying correctly

Solution: Ensure UTF-8 encoding when reading source files:

with open('input.md', 'r', encoding='utf-8') as f: content = f.read()

Missing Images

Problem: Images not appearing in output

Solution: Use absolute paths or verify relative path from script location:

from pathlib import Path image_path = Path(file).parent / 'assets' / 'logo.png' doc.add_picture(str(image_path))

Related Skills

pdf: Convert DOCX to PDF or extract content from PDFs
xlsx: Work with Excel files for data that feeds into reports
pptx: Create presentations from document content

References

Detailed API Reference
python-docx Documentation
Markdown Syntax Guide

docx

Safety Notice

Copy this and send it to your AI assistant to learn

Read a document

Create a document

Extract all text

Extract tables

Get document properties

Add title

Add metadata paragraph

Add section heading

Add content paragraph

Add a table

Add data rows

Replace placeholder text

Add new section at end

Heading 1

Heading 2

Heading 3

Create a custom style

Use the style

Penetration Test Report

Executive Summary

Findings

Finding 1: SQL Injection

Remediation

Load client data

Check available styles

Source Transparency

Related Skills

email-forensics

disk-forensics

forensic-reporting