DOCX Skill
Read, modify, and create Microsoft Word documents with support for converting markdown content into professionally formatted documents using templates.
Capabilities
-
Read Documents: Extract text, tables, and metadata from DOCX files
-
Create Documents: Generate new DOCX files from scratch or templates
-
Modify Documents: Edit existing documents, add content, modify styles
-
Markdown Conversion: Convert markdown files to DOCX using template styling
-
Template-Based Generation: Use DOCX templates to maintain consistent formatting
Quick Start
from docx import Document
Read a document
doc = Document('report.docx') for para in doc.paragraphs: print(para.text)
Create a document
doc = Document() doc.add_heading('Report Title', 0) doc.add_paragraph('Content here.') doc.save('output.docx')
Usage
Reading Documents
Extract content from existing DOCX files.
Input: Path to a DOCX file
Process:
-
Open document with python-docx
-
Iterate through paragraphs, tables, or other elements
-
Extract text, styles, or metadata
Example:
from docx import Document
doc = Document('input.docx')
Extract all text
full_text = [] for para in doc.paragraphs: full_text.append(para.text) text = '\n'.join(full_text)
Extract tables
for table in doc.tables: for row in table.rows: row_data = [cell.text for cell in row.cells] print(row_data)
Get document properties
props = doc.core_properties print(f"Title: {props.title}") print(f"Author: {props.author}")
Creating New Documents
Generate DOCX files from scratch.
Input: Content to include (text, tables, images)
Process:
-
Create new Document object
-
Add headings, paragraphs, tables
-
Apply styles as needed
-
Save to file
Example:
from docx import Document from docx.shared import Inches, Pt from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
Add title
title = doc.add_heading('Security Assessment Report', 0) title.alignment = WD_ALIGN_PARAGRAPH.CENTER
Add metadata paragraph
doc.add_paragraph('Prepared by: Security Team') doc.add_paragraph('Date: January 2024')
Add section heading
doc.add_heading('Executive Summary', level=1)
Add content paragraph
para = doc.add_paragraph() para.add_run('This assessment identified ').bold = False para.add_run('3 critical findings').bold = True para.add_run(' requiring immediate attention.')
Add a table
table = doc.add_table(rows=1, cols=3) table.style = 'Table Grid' header = table.rows[0].cells header[0].text = 'Finding' header[1].text = 'Severity' header[2].text = 'Status'
Add data rows
data = [ ('SQL Injection', 'Critical', 'Open'), ('XSS Vulnerability', 'High', 'In Progress'), ] for finding, severity, status in data: row = table.add_row().cells row[0].text = finding row[1].text = severity row[2].text = status
doc.save('report.docx')
Modifying Existing Documents
Edit and update existing DOCX files.
Input: Path to existing DOCX file
Process:
-
Open existing document
-
Locate content to modify
-
Make changes
-
Save (same or new file)
Example:
from docx import Document
doc = Document('template.docx')
Replace placeholder text
for para in doc.paragraphs: if '{{CLIENT_NAME}}' in para.text: para.text = para.text.replace('{{CLIENT_NAME}}', 'Acme Corp') if '{{DATE}}' in para.text: para.text = para.text.replace('{{DATE}}', '2024-01-15')
Add new section at end
doc.add_heading('Additional Findings', level=1) doc.add_paragraph('New content added during modification.')
doc.save('modified_report.docx')
Markdown to DOCX Conversion
Convert markdown files to Word documents using template styling.
Input:
-
Markdown file path
-
DOCX template file path (optional)
-
Output file path
Process:
-
Parse markdown content
-
Load template document for styles (if provided)
-
Map markdown elements to Word styles
-
Generate formatted DOCX
Using the conversion script:
python scripts/md_to_docx.py report.md --template company_template.docx --output final_report.docx
Programmatic usage:
from scripts.md_to_docx import MarkdownToDocx
converter = MarkdownToDocx(template_path='template.docx') converter.convert('input.md', 'output.docx')
Style Mapping:
Markdown Word Style
Heading 1
Heading 1
Heading 2
Heading 2
Heading 3
Heading 3
bold
Bold run
italic
Italic run
code
Code character style
- item
List Bullet
- item
List Number
Code blocks Code block style
quote
Quote style
Tables Table Grid
Template Requirements:
For best results, your template should define these styles:
-
Heading 1, Heading 2, Heading 3
-
Normal (body text)
-
List Bullet, List Number
-
Quote (for blockquotes)
-
Code (character style for inline code)
Working with Styles
Apply consistent formatting using document styles.
Example:
from docx import Document from docx.shared import Pt, RGBColor from docx.enum.style import WD_STYLE_TYPE
doc = Document()
Create a custom style
styles = doc.styles style = styles.add_style('Finding Critical', WD_STYLE_TYPE.PARAGRAPH) style.font.bold = True style.font.size = Pt(12) style.font.color.rgb = RGBColor(255, 0, 0)
Use the style
doc.add_paragraph('CRITICAL: SQL Injection Found', style='Finding Critical')
doc.save('styled_report.docx')
Adding Images
Insert images into documents.
Example:
from docx import Document from docx.shared import Inches
doc = Document() doc.add_heading('Network Diagram', level=1) doc.add_picture('network_diagram.png', width=Inches(6)) doc.add_paragraph('Figure 1: Current network architecture')
doc.save('report_with_images.docx')
Configuration
Environment Variables
Variable Description Required Default
DOCX_TEMPLATE_DIR
Default template directory No ./assets/templates
Script Options
Option Type Description
--template
path DOCX template for styling
--output
path Output file path
--toc
flag Generate table of contents
--verbose
flag Enable verbose logging
Examples
Example 1: Security Report from Markdown
Scenario: Convert a penetration test report written in markdown to a professional Word document.
Input (report.md ):
Penetration Test Report
Executive Summary
The assessment identified 3 critical and 5 high severity findings.
Findings
Finding 1: SQL Injection
Severity: Critical
The application is vulnerable to SQL injection in the login form.
| Parameter | Payload | Result |
|---|---|---|
| username | ' OR 1=1-- | Auth bypass |
Remediation
- Use parameterized queries
- Implement input validation
- Apply least privilege
Command:
python scripts/md_to_docx.py report.md --template corporate_template.docx --output pentest_report.docx
Output: Professional DOCX with corporate styling applied.
Example 2: Batch Document Generation
Scenario: Generate multiple reports from a template with different data.
from docx import Document import json
Load client data
with open('clients.json') as f: clients = json.load(f)
for client in clients: doc = Document('assessment_template.docx')
# Replace placeholders
for para in doc.paragraphs:
for key, value in client.items():
placeholder = f'{{{{{key}}}}}'
if placeholder in para.text:
para.text = para.text.replace(placeholder, str(value))
doc.save(f"reports/{client['name']}_assessment.docx")
Limitations
-
Complex formatting: Some advanced Word features (SmartArt, embedded objects) may not be fully supported
-
Track changes: Limited support for revision tracking
-
Macros: VBA macros are not supported for security reasons
-
Large tables: Tables with merged cells may have rendering inconsistencies
Troubleshooting
Style Not Applied
Problem: Custom styles from template not appearing in output
Solution: Ensure the style exists in the template and use exact style name:
Check available styles
for style in doc.styles: print(style.name)
Encoding Issues
Problem: Special characters not displaying correctly
Solution: Ensure UTF-8 encoding when reading source files:
with open('input.md', 'r', encoding='utf-8') as f: content = f.read()
Missing Images
Problem: Images not appearing in output
Solution: Use absolute paths or verify relative path from script location:
from pathlib import Path image_path = Path(file).parent / 'assets' / 'logo.png' doc.add_picture(str(image_path))
Related Skills
-
pdf: Convert DOCX to PDF or extract content from PDFs
-
xlsx: Work with Excel files for data that feeds into reports
-
pptx: Create presentations from document content
References
-
Detailed API Reference
-
python-docx Documentation
-
Markdown Syntax Guide