docx-md

Low-level docx format tool for AI document review. Three operations: (1) read docx → output compact Markdown or JSON; (2) apply edits JSON back to docx (tracked revisions and comments); (3) finalize (accept revisions, remove comments). Markdown output saves tokens vs full JSON. Use when raw .docx read/write is needed. For full contract review workflow, use contract-review-workflow which invokes this tool.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "docx-md" with this command: npx skills add yanweiliang323868-del/docx-md

Word DOCX (OOXML) – docx-md

Overview

Three entry points: Read – output compact Markdown (default, token-efficient) or full JSON; Modify – apply AI-returned edits to the docx; Finalize – accept all revisions and remove all comments. Implemented via OOXML (ZIP + XML). No commercial Word libraries required.

Workflow

GoalAction
Get document for AIRead: run read script → Markdown (default) or JSON. Markdown includes <!-- b:N --> blockIndex markers for edit targeting.
Apply AI edits to docxModify: run apply script with docx + edits JSON → new docx with track changes and comments.
Deliver final versionFinalize: run finalize script → new docx with no revisions/comments.

LLM-oriented pipeline

  1. Read – Parse docx; output Markdown (default) or JSON. Markdown uses <!-- b:N --> prefix per block; revisions: {+inserted+} {-deleted-}; comments: [comment: text].
  2. Send the output + task prompt to the model; require the model to output only the edit JSON: blockIndex, originalContent, content, basis .
  3. Modify – Script infers op from blockIndex, originalContent, content, basis; converts to OOXML (w:ins / w:del / comment anchors), then write back to Word.
  4. Finalize – When the user confirms, run finalize to accept all revisions and remove all comments.

See references/llm-pipeline.md for the Markdown format, JSON schema, and edit format.

1. Read

  • Parse word/document.xml (w:body only) and word/comments.xml.
  • Output Markdown (default) or JSON. Markdown is compact and token-efficient.

Script: scripts/read_docx.py

# Default: Markdown output (token-efficient)
python3 skills/docx-md/scripts/read_docx.py document.docx
python3 skills/docx-md/scripts/read_docx.py document.docx -o result.md

# JSON output (full structure)
python3 skills/docx-md/scripts/read_docx.py document.docx -f json -o result.json

Options:

  • -o, --output – Output path (default: stdout)
  • -f, --formatmd (default) or json

2. Modify

  • Input: docx path + edit JSON { modifications: [{ blockIndex, originalContent, content, basis }] } (same blockIndex as read output).
  • Flow: Convert JSON to OOXML (w:ins / w:del / comments), then write back to Word.

Script: scripts/apply_edits_docx.py. Use - as edits file to read JSON from stdin.

python3 skills/docx-md/scripts/apply_edits_docx.py document.docx edits.json -o output.docx
python3 skills/docx-md/scripts/apply_edits_docx.py document.docx - -o output.docx  # stdin

Options: --author (default: "Review")

3. Finalize

  • Accept all revisions (flatten to final text), remove all comments. Save as new docx.
  • Uses docx-revisions to accept revisions (preserves encoding), then removes comment markup via regex on raw bytes.

Script: scripts/finalize_docx.py

Requires: pip install docx-revisions (see requirements.txt)

python3 skills/docx-md/scripts/finalize_docx.py input.docx -o output.docx

Resources

scripts/

  • read_docx.py – Read: python3 scripts/read_docx.py document.docx [-o out.md] [-f md|json]
  • apply_edits_docx.py – Modify: python3 scripts/apply_edits_docx.py document.docx edits.json -o output.docx
  • finalize_docx.py – Finalize: python3 scripts/finalize_docx.py input.docx -o output.docx

references/

  • ooxml.md – OOXML layout (document.xml, comments.xml, revisions, comments)
  • llm-pipeline.md – Pipeline: read → Markdown/JSON → model edits → modify; defines Markdown format, JSON shape (blockIndex, originalContent, content, basis)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

qwencloud-model-selector

[QwenCloud] Recommend the best Qwen model and parameters. TRIGGER when: choosing between Qwen models, comparing Qwen model pricing, understanding Qwen model...

Registry SourceRecently Updated
General

deployment-manager

You are a deployment manager with expertise in release orchestration, deployment strategies, and production reliability. Use when: release orchestration and...

Registry SourceRecently Updated
General

Hk Stock Morning Report

Generate HK stock market morning report (股市晨報) for bank trading desks. Triggers: "生成晨报", "股市晨报", "今日股市", "港股晨報" 報告結構(5部分): 1. 市場回顧(恒指/科指/國指 + 強弱勢股) 2. 南下資金(總...

Registry SourceRecently Updated
General

Story Long Scan

长篇网文扫榜。分析起点、番茄、晋江等平台排行榜数据,提炼市场趋势与热门题材。 触发方式:/story-long-scan、/长篇扫榜、「长篇什么火」「起点排行」

Registry SourceRecently Updated