Office Docs Skills Library

Office document automation, template generation, and document processing patterns Version: 1.0.0 | Last Updated: 2026-01-17

Overview

This library contains 5 production-ready skills for automating Microsoft Office and PDF document workflows. Each skill covers a specific document type with patterns for generation, manipulation, and template-based automation. Skills follow the Anthropic Skills format with practical examples from real-world document processing pipelines.

Quick Start

Browse available skills

ls skills/office-docs/

Read a skill

cat skills/office-docs/python-docx/SKILL.md

Skills are documentation - implement patterns in your document workflows

Available Skills

Skill Description Key Features

python-docx Word document creation and manipulation Paragraphs, tables, images, styles

openpyxl Excel workbook automation Cells, formulas, charts, formatting

python-pptx PowerPoint presentation generation Slides, shapes, layouts, animations

pypdf PDF manipulation and extraction Merge, split, extract, watermark

docx-templates Template-based document generation Jinja2 syntax, loops, conditionals

Skill Categories

Document Creation

python-docx - Create and modify Word documents programmatically
python-pptx - Generate PowerPoint presentations from data

Spreadsheet Automation

openpyxl - Full Excel workbook manipulation with formulas

PDF Processing

pypdf - Read, merge, split, and modify PDF files

Template Engines

docx-templates - Fill Word templates with dynamic data

Skill Selection Guide

Choose python-docx when:

Creating reports, contracts, or documentation from scratch
Modifying existing Word documents programmatically
Need full control over document structure and styling
Generating documents with complex formatting requirements

Choose openpyxl when:

Automating Excel report generation
Reading and processing spreadsheet data
Creating workbooks with formulas and charts
Batch processing multiple Excel files

Choose python-pptx when:

Generating presentations from data automatically
Creating slide decks with consistent branding
Building presentation templates with dynamic content
Automating reporting dashboards as slides

Choose pypdf when:

Merging multiple PDFs into one document
Extracting text or pages from PDF files
Adding watermarks or page numbers
Splitting large PDFs into smaller files

Choose docx-templates when:

Filling template documents with database records
Generating bulk documents (letters, invoices, contracts)
Non-developers need to maintain document templates
Jinja2-style syntax is preferred for templates

Quick Examples

Python-docx Report Generation

from docx import Document from docx.shared import Inches, Pt from docx.enum.text import WD_ALIGN_PARAGRAPH

Create document

doc = Document()

Add title

title = doc.add_heading('Monthly Report', level=0) title.alignment = WD_ALIGN_PARAGRAPH.CENTER

Add paragraph with formatting

para = doc.add_paragraph() run = para.add_run('Executive Summary: ') run.bold = True para.add_run('This report covers key metrics for January 2026.')

Add table

table = doc.add_table(rows=4, cols=3) table.style = 'Table Grid'

Header row

headers = ['Metric', 'Value', 'Change'] for i, header in enumerate(headers): cell = table.rows[0].cells[i] cell.text = header cell.paragraphs[0].runs[0].bold = True

Data rows

data = [ ['Revenue', '$1.2M', '+15%'], ['Users', '45,000', '+8%'], ['Retention', '92%', '+3%'] ] for row_idx, row_data in enumerate(data, start=1): for col_idx, value in enumerate(row_data): table.rows[row_idx].cells[col_idx].text = value

Add image

doc.add_picture('chart.png', width=Inches(5))

Save

doc.save('monthly_report.docx')

Openpyxl Excel Automation

from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment from openpyxl.chart import BarChart, Reference

Create workbook

wb = Workbook() ws = wb.active ws.title = "Sales Data"

Headers with styling

headers = ['Product', 'Q1', 'Q2', 'Q3', 'Q4', 'Total'] header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid") header_font = Font(color="FFFFFF", bold=True)

for col, header in enumerate(headers, start=1): cell = ws.cell(row=1, column=col, value=header) cell.fill = header_fill cell.font = header_font cell.alignment = Alignment(horizontal='center')

Data with formulas

data = [ ['Widget A', 100, 150, 200, 180], ['Widget B', 80, 120, 140, 160], ['Widget C', 200, 220, 250, 280] ]

for row_idx, row_data in enumerate(data, start=2): for col_idx, value in enumerate(row_data, start=1): ws.cell(row=row_idx, column=col_idx, value=value) # Total formula ws.cell(row=row_idx, column=6, value=f'=SUM(B{row_idx}:E{row_idx})')

Create chart

chart = BarChart() chart.title = "Quarterly Sales" chart.x_axis.title = "Product" chart.y_axis.title = "Units"

data_ref = Reference(ws, min_col=2, max_col=5, min_row=1, max_row=4) cats_ref = Reference(ws, min_col=1, min_row=2, max_row=4) chart.add_data(data_ref, titles_from_data=True) chart.set_categories(cats_ref)

ws.add_chart(chart, "H2")

wb.save('sales_report.xlsx')

Python-pptx Presentation

from pptx import Presentation from pptx.util import Inches, Pt from pptx.enum.text import PP_ALIGN from pptx.dml.color import RgbColor

Create presentation

prs = Presentation()

Title slide

title_layout = prs.slide_layouts[0] slide = prs.slides.add_slide(title_layout) slide.shapes.title.text = "Q1 2026 Results" slide.placeholders[1].text = "Company Performance Review"

Content slide with bullet points

bullet_layout = prs.slide_layouts[1] slide = prs.slides.add_slide(bullet_layout) slide.shapes.title.text = "Key Highlights"

body = slide.placeholders[1] tf = body.text_frame tf.text = "Revenue up 15% YoY"

for point in ["Customer base grew 8%", "New product launch successful", "Expanded to 3 new markets"]: p = tf.add_paragraph() p.text = point p.level = 1

Slide with chart placeholder

chart_layout = prs.slide_layouts[5] slide = prs.slides.add_slide(chart_layout) slide.shapes.title.text = "Sales Breakdown"

Add image

slide.shapes.add_picture( 'sales_chart.png', Inches(1), Inches(2), width=Inches(8) )

prs.save('quarterly_results.pptx')

PyPDF Manipulation

from pypdf import PdfReader, PdfWriter, PdfMerger

Merge PDFs

merger = PdfMerger() merger.append('report_part1.pdf') merger.append('report_part2.pdf') merger.append('appendix.pdf') merger.write('complete_report.pdf') merger.close()

Split PDF

reader = PdfReader('large_document.pdf') for i, page in enumerate(reader.pages): writer = PdfWriter() writer.add_page(page) writer.write(f'page_{i+1}.pdf')

Extract text

reader = PdfReader('document.pdf') text = "" for page in reader.pages: text += page.extract_text()

Add watermark

reader = PdfReader('document.pdf') watermark = PdfReader('watermark.pdf') writer = PdfWriter()

for page in reader.pages: page.merge_page(watermark.pages[0]) writer.add_page(page)

writer.write('watermarked_document.pdf')

Docx-Templates Bulk Generation

from docxtpl import DocxTemplate

Load template

doc = DocxTemplate("contract_template.docx")

Context for template

context = { 'client_name': 'Acme Corporation', 'contract_date': '2026-01-17', 'contract_value': '$50,000', 'terms': '12 months', 'services': [ {'name': 'Consulting', 'hours': 100, 'rate': '$200'}, {'name': 'Development', 'hours': 200, 'rate': '$150'}, {'name': 'Support', 'hours': 50, 'rate': '$100'} ], 'include_sla': True, 'sla_response_time': '4 hours' }

Render and save

doc.render(context) doc.save('acme_contract.docx')

Bulk generation

clients = load_clients_from_database() for client in clients: doc = DocxTemplate("invoice_template.docx") doc.render(client.to_dict()) doc.save(f'invoices/{client.id}_invoice.docx')

Integration Patterns

Document Generation Pipeline

Data Source --> Template Selection --> Rendering --> Post-processing --> Output | | | | | +-- Database +-- By type +-- Variables +-- Convert +-- Save +-- API +-- By language +-- Loops +-- Combine +-- Email +-- Form +-- By recipient +-- Images +-- Sign +-- Archive

Batch Processing Pattern

Input Files --> Validation --> Processing --> Quality Check --> Output | | | | | +-- Queue +-- Format +-- Transform +-- Verify +-- Organize +-- Watch +-- Content +-- Extract +-- Log +-- Notify

Template Management

Template Repo --> Version Control --> Environment --> Runtime | | | | +-- Design +-- Review +-- Dev/Prod +-- Hot reload +-- Test +-- Approve +-- Variables +-- Fallbacks

Common Patterns Across Skills

Error Handling

from pathlib import Path

def safe_document_generation(template_path, context, output_path): """Generate document with comprehensive error handling.""" try: if not Path(template_path).exists(): raise FileNotFoundError(f"Template not found: {template_path}")

    doc = DocxTemplate(template_path)
    doc.render(context)

    # Ensure output directory exists
    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
    doc.save(output_path)

    return {"success": True, "path": output_path}

except Exception as e:
    logger.error(f"Document generation failed: {e}")
    return {"success": False, "error": str(e)}

Temporary File Handling

import tempfile from contextlib import contextmanager

@contextmanager def temp_document(): """Create temporary document that auto-cleans.""" with tempfile.NamedTemporaryFile(suffix='.docx', delete=False) as f: temp_path = f.name try: yield temp_path finally: Path(temp_path).unlink(missing_ok=True)

Streaming Large Files

def process_large_excel(file_path, chunk_size=1000): """Process large Excel files in chunks.""" from openpyxl import load_workbook

wb = load_workbook(file_path, read_only=True)
ws = wb.active

chunk = []
for row in ws.iter_rows(values_only=True):
    chunk.append(row)
    if len(chunk) >= chunk_size:
        yield chunk
        chunk = []

if chunk:
    yield chunk

Integration with Workspace-Hub

These skills power document automation across the workspace-hub ecosystem:

workspace-hub/ ├── documents/ │ ├── templates/ # Uses: docx-templates │ │ ├── contracts/ │ │ ├── invoices/ │ │ └── reports/ │ ├── generators/ # Uses: python-docx, openpyxl, python-pptx │ │ ├── report_builder.py │ │ ├── spreadsheet_gen.py │ │ └── presentation_gen.py │ └── processors/ # Uses: pypdf │ ├── pdf_merger.py │ └── text_extractor.py ├── output/ │ ├── generated/ │ └── archive/ └── config/ └── document_config.yaml

Best Practices

Template Versioning

templates/ ├── v1/ │ └── contract_template.docx ├── v2/ │ └── contract_template.docx # Updated branding └── current -> v2/ # Symlink to current version

Validation Before Generation

def validate_context(context, required_fields): """Validate context before document generation.""" missing = [f for f in required_fields if f not in context or not context[f]] if missing: raise ValueError(f"Missing required fields: {missing}") return True

Output Organization

from datetime import datetime

def get_output_path(doc_type, client_id, extension='docx'): """Generate organized output path.""" date_str = datetime.now().strftime('%Y/%m/%d') return f"output/{doc_type}/{date_str}/{client_id}.{extension}"

Logging and Audit Trail

def generate_with_audit(template, context, output_path): """Generate document with audit logging.""" start_time = time.time()

result = generate_document(template, context, output_path)

audit_log.info({
    'action': 'document_generated',
    'template': template,
    'output': output_path,
    'duration': time.time() - start_time,
    'context_keys': list(context.keys())
})

return result

Testing Document Generation

import pytest from docx import Document

def test_report_generation(): """Test report document structure.""" generate_report(sample_data, 'test_output.docx')

doc = Document('test_output.docx')

# Verify structure
assert len(doc.paragraphs) > 0
assert doc.paragraphs[0].text == 'Monthly Report'

# Verify tables
assert len(doc.tables) == 1
assert len(doc.tables[0].rows) == 4

def test_template_rendering(): """Test template variable substitution.""" context = {'name': 'Test Corp', 'amount': '$1000'}

doc = DocxTemplate('template.docx')
doc.render(context)
doc.save('output.docx')

result = Document('output.docx')
full_text = '\n'.join([p.text for p in result.paragraphs])

assert 'Test Corp' in full_text
assert '$1000' in full_text

def test_pdf_merge(): """Test PDF merging preserves pages.""" merge_pdfs(['doc1.pdf', 'doc2.pdf'], 'merged.pdf')

reader = PdfReader('merged.pdf')
assert len(reader.pages) == 4  # 2 + 2 pages

Related Resources

python-docx Documentation
openpyxl Documentation
python-pptx Documentation
pypdf Documentation
docxtpl Documentation

Version History

1.0.0 (2026-01-17): Initial release with 5 office document skills

These skills represent patterns refined across document automation systems generating thousands of documents daily in production environments.

office-docs

Safety Notice

Copy this and send it to your AI assistant to learn

Browse available skills

Read a skill

Skills are documentation - implement patterns in your document workflows

Create document

Add title

Add paragraph with formatting

Add table

Header row

Data rows

Add image

Save

Create workbook

Headers with styling

Data with formulas

Create chart

Create presentation

Title slide

Content slide with bullet points

Slide with chart placeholder

Add image

Merge PDFs

Split PDF

Extract text

Add watermark

Load template

Context for template

Render and save

Bulk generation

Source Transparency

Related Skills

echarts

pandoc

mkdocs

gis