doc-to-markdown

Use when converting Word documents (.doc/.docx) to clean Markdown with images extracted to a separate folder for readability and AI compatibility

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "doc-to-markdown" with this command: npx skills add sheepmao/doc-to-markdown-skill/sheepmao-doc-to-markdown-skill-doc-to-markdown

Doc-to-Markdown (Word → Markdown)

Convert Microsoft Word .doc / .docx into:

  • a clean Markdown file (.md)
  • plus an optional images folder (*_images/) with relative image links

This is designed to keep Markdown small (good for humans + LLMs) while preserving diagrams.

Quickstart (copy/paste)

# 1) Convert a single file (.docx or .doc)
python3 convert_word_to_markdown.py "path/to/document.docx"

# 2) Embedded mode (single self-contained .md, very large)
python3 convert_word_to_markdown.py --embedded "path/to/document.docx"

# 3) If anything fails, run a dependency check
python3 convert_word_to_markdown.py --check

Batch convert (current folder)

for f in *.doc *.docx; do
  [ -e "$f" ] || continue
  python3 convert_word_to_markdown.py "$f"
done

Outputs

Default (external images):

document.docx
document.md
document_images/
  image1.png
  image2.png
  ...

Embedded mode:

document.docx
document.md   # contains base64 images

Requirements

  • Recommended (most reliable): install markitdown into a local virtualenv in this repo
    • bash setup_venv.sh
    • (manual) python3.11 -m venv .venv + .venv/bin/python -m pip install 'markitdown[all]'
  • Alternative: install markitdown globally
    • python3 -m pip install 'markitdown[all]' (requires Python 3.10+ and markitdown on PATH)
  • Fallback: uv (provides uvx) so the scripts can run markitdown without pip installs
    • macOS: brew install uv
  • For .doc (legacy) support: LibreOffice (brew install --cask libreoffice)

Environment Overrides (for reliability)

  • MARKITDOWN_UVX_PYTHON=3.11 (default) — change the Python version used by uvx
  • MARKITDOWN_UVX_OFFLINE=0 — allow uvx to use network (default: offline)
  • MARKITDOWN_CMD="... markitdown" — full command override (advanced)
  • UV_CACHE_DIR=/tmp/uv-cache — use this if uvx can’t write to its cache directory (default: ./.uv-cache/)

Common Failure Modes

  • .doc conversion fails:
    • LibreOffice GUI running → quit LibreOffice (or killall soffice) and retry
    • If you see Abort trap: 6 / exit 134 in a sandboxed tool runner → pre-convert .doc to .docx outside the sandbox, then convert the .docx
  • WMF/EMF diagrams don’t display: in sandboxed environments the WMF/EMF → PNG step may be skipped; convert those images to PNG outside the sandbox if needed
  • markitdown not found: create ./.venv/ (recommended) or install markitdown globally
  • Failed to initialize cache at ~/.cache/uv: set UV_CACHE_DIR=/tmp/uv-cache and retry

Notes

  • convert_word_to_markdown.py is the entrypoint (handles both .doc and .docx).
  • convert_with_images.py is an internal helper and only supports .docx.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ll-feishu-audio

飞书语音交互技能。支持语音消息自动识别、AI 处理、语音回复全流程。需要配置 FEISHU_APP_ID 和 FEISHU_APP_SECRET 环境变量。使用 faster-whisper 进行语音识别,Edge TTS 进行语音合成,自动转换 OPUS 格式并通过飞书发送。适用于飞书平台的语音对话场景。

Archived SourceRecently Updated
General

test_skill

import json import tkinter as tk from tkinter import messagebox, simpledialog

Archived SourceRecently Updated
General

51mee-resume-profile

简历画像。触发场景:用户要求生成候选人画像;用户想了解候选人的多维度标签和能力评估。

Archived SourceRecently Updated
General

51mee-resume-parse

简历解析。触发场景:用户上传简历文件要求解析、提取结构化信息。

Archived SourceRecently Updated