hwp-reader

# 🐧 HWP Reader — Read & Analyze Korean HWP/HWPX Documents

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "hwp-reader" with this command: npx skills add mupengi-bot/hwp-reader

🐧 HWP Reader — Read & Analyze Korean HWP/HWPX Documents

Author: 무펭이 🐧 | v1.0.0

Description

Read and extract text content from Korean HWP (한글) and HWPX files. Supports both legacy HWP format (via pyhwp) and modern HWPX format (ZIP-based XML).

When to Use

  • User asks to read/analyze a .hwp or .hwpx file
  • Government support application forms (정부지원사업 신청서)
  • Any Korean document in Hangul Word Processor format

How It Works

HWP Files (Legacy Format)

python3 -c "
from hwp5.hwp5txt import main
import sys
sys.argv = ['hwp5txt', 'FILE_PATH']
main()
"

HWPX Files (Modern XML Format)

python3 -c "
import zipfile
z = zipfile.ZipFile('FILE_PATH')

# Quick preview text
if 'Preview/PrvText.txt' in z.namelist():
    print(z.read('Preview/PrvText.txt').decode('utf-8'))

# Full content from section XMLs
import xml.etree.ElementTree as ET
for name in sorted(z.namelist()):
    if name.startswith('Contents/section') and name.endswith('.xml'):
        root = ET.fromstring(z.read(name))
        for elem in root.iter():
            if elem.text and elem.text.strip():
                print(elem.text.strip())
"

Capabilities

FeatureHWPHWPX
Text extraction✅ pyhwp✅ ZIP+XML
Table detection⚠️ <표> markers✅ XML tags
Image extraction✅ from BinData/
Metadata✅ via hwp5✅ from version.xml

Dependencies

  • pyhwp (pip install pyhwp) — installed at /Users/mupeng/Library/Python/3.9/lib/python/site-packages/hwp5/
  • Python 3.9+ — standard library zipfile, xml.etree.ElementTree

Limitations

  • HWP text extraction loses table structure (shows <표> placeholder)
  • HWPX Preview/PrvText.txt is truncated to ~1KB; use section XMLs for full content
  • Complex formatting (colors, fonts, page layout) not preserved in text mode
  • Encrypted/password-protected HWP files not supported

Usage Examples

Read a government application form

"이 HWP 파일 읽어줘: /path/to/신청서.hwp"
→ Extract text → Analyze structure → Summarize sections

Compare two versions

"v1.hwp와 v2.hwp 차이점 분석해줘"
→ Extract both → Diff content → Report changes

Fill in a template

"이 양식에 우리 사업 내용 채워줘"
→ Read template → Identify blanks → Generate content suggestions

🐧 무펭이 — Making Korean documents accessible to AI agents

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

技能编辑器

编辑、完善或审查 AgentSkills。当需要创建新技能、对现有 SKILL.md 进行修改、清理/审计/整理技能文件时激活此技能。触发词:编辑技能, skill 注意事项, metadata 检查, 完善技能, 清理技能, 审计技能, skill 规范, 编写 skill, 新建技能

Registry SourceRecently Updated
Automation

全闭环管道

全闭环自动化管道 — Hunter→Skill Factory→Orchestrator→Dashboard→Profit。将Phase 1-3所有组件串联为自动运行的超级管道。核心能力:(1) 一键全流程 (2) 定时自动运行 (3) 异常自愈 (4) 利润报告

Registry SourceRecently Updated
Automation

智美人AI实战课

《智美人AI Agent实战课》配套技能——从0到1搭建AI Agent的完整课程体系。第01节:AI Agent基础概念+工具链搭建。课程内容含数字人讲解视频、实战代码、课后练习。覆盖:OpenClaw配置、技能安装、MCP工具、多Agent协同、变现实战。

Registry SourceRecently Updated
Automation

利润优化引擎

利润优化引擎 — 订单管理/计价/结算模拟。核心能力:(1) 订单管理 (2) 计价模型 (3) 成本追踪 (4) 利润计算

Registry SourceRecently Updated
hwp-reader | V50.AI