docx-to-html

Use this skill whenever the user has a DOCX file (.docx) and wants to convert, read, view, extract content from, or process it in any way — including summarization, displaying in a browser, extracting tables or lists, or feeding into AI pipelines. Always use this skill for any task involving .docx files, even if the request seems simple. Triggers include: 'convert docx', 'open word file', 'read word document', 'extract tables from docx', or any mention of a .docx filename.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "docx-to-html" with this command: npx skills add bibekyess/docx-to-html

DOCX to HTML Converter

This skill provides a straightforward method to convert Microsoft Word (.docx) documents into clean, semantic HTML, making them suitable for various web-based and AI-driven applications.

Compatibility

  • Python 3 (for the conversion wrapper)
  • Node.js with mammoth installed (core conversion engine)

To install Node.js dependencies, run once from the scripts/ directory:

npm install

Use Cases

  • Browser-Based Viewing: Convert DOCX documents for display in web browsers without requiring Microsoft Word.
  • AI-Ready Content: Prepare DOCX content for LLMs for tasks like summarization, Q&A, and semantic search.
  • Web Integration: Integrate Word document content into web applications, CMS, or online editors.
  • Data Extraction: Extract structured data (tables, lists, headings) from DOCX files for automated reporting and analysis.
  • Search and Indexing: Enable full-text and vector search by converting DOCX content into easily indexable HTML.

Workflow

  1. Locate DOCX File: Identify the path to the .docx file to convert.

  2. Run Conversion Script: Execute the Python wrapper from the skill's scripts/ directory:

    python3 <skill-dir>/scripts/convert.py <input_path.docx> <output_path.html>
    

    Replace <skill-dir> with the actual path where this skill is installed.

  3. Verify Output: Open the generated .html file in a browser and check:

    • Headings (<h1>, <h2>, etc.) appear at the correct hierarchy levels
    • Tables render with the expected rows and columns
    • Lists appear as bullet or numbered items (not plain text)
    • Bold, italic, and inline formatting are preserved
    • Images are visible (embedded as base64 by default)
  4. Process HTML: Use the resulting HTML for further tasks like summarization, indexing, or display.

Bundled Resources

  • scripts/docx-converter.js: Core Node.js conversion logic using mammoth.js.
  • scripts/convert.py: Python wrapper for invoking the Node.js converter.
  • scripts/package.json: Node.js dependency manifest (includes mammoth).

Technical Details

The conversion leverages mammoth.js, which prioritizes semantic meaning over visual replication:

  • Semantic Conversion: Document structure maps to proper HTML — headings become <h1>/<h2>, lists become <ul>/<ol>, etc.
  • Basic Styling: Bold, italics, and common paragraph styles are preserved.
  • Image Embedding: Images are extracted and embedded as base64 data URIs in the HTML output.

Troubleshooting

ProblemLikely CauseFix
node: command not foundNode.js not installedInstall Node.js (v16+)
Cannot find module 'mammoth'npm deps missingRun npm install in scripts/
Empty or garbled outputCorrupted or password-protected DOCXTry re-saving the file from Microsoft Word
Missing imagesLarge embedded imagesCheck mammoth.js image size limits in docx-converter.js

Limitations

  • Advanced or highly specific styling from the original DOCX may not be perfectly replicated in the HTML output.
  • Features like tracked changes, comments, or complex layout elements may be simplified or omitted.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

qwencloud-model-selector

[QwenCloud] Recommend the best Qwen model and parameters. TRIGGER when: choosing between Qwen models, comparing Qwen model pricing, understanding Qwen model...

Registry SourceRecently Updated
General

deployment-manager

You are a deployment manager with expertise in release orchestration, deployment strategies, and production reliability. Use when: release orchestration and...

Registry SourceRecently Updated
General

Hk Stock Morning Report

Generate HK stock market morning report (股市晨報) for bank trading desks. Triggers: "生成晨报", "股市晨报", "今日股市", "港股晨報" 報告結構(5部分): 1. 市場回顧(恒指/科指/國指 + 強弱勢股) 2. 南下資金(總...

Registry SourceRecently Updated
General

Story Long Scan

长篇网文扫榜。分析起点、番茄、晋江等平台排行榜数据,提炼市场趋势与热门题材。 触发方式:/story-long-scan、/长篇扫榜、「长篇什么火」「起点排行」

Registry SourceRecently Updated