hwp-reader

# ๐Ÿง HWP Reader โ€” Read & Analyze Korean HWP/HWPX Documents

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "hwp-reader" with this command: npx skills add mupengi-bot/hwp-reader

๐Ÿง HWP Reader โ€” Read & Analyze Korean HWP/HWPX Documents

Author: ๋ฌดํŽญ์ด ๐Ÿง | v1.0.0

Description

Read and extract text content from Korean HWP (ํ•œ๊ธ€) and HWPX files. Supports both legacy HWP format (via pyhwp) and modern HWPX format (ZIP-based XML).

When to Use

  • User asks to read/analyze a .hwp or .hwpx file
  • Government support application forms (์ •๋ถ€์ง€์›์‚ฌ์—… ์‹ ์ฒญ์„œ)
  • Any Korean document in Hangul Word Processor format

How It Works

HWP Files (Legacy Format)

python3 -c "
from hwp5.hwp5txt import main
import sys
sys.argv = ['hwp5txt', 'FILE_PATH']
main()
"

HWPX Files (Modern XML Format)

python3 -c "
import zipfile
z = zipfile.ZipFile('FILE_PATH')

# Quick preview text
if 'Preview/PrvText.txt' in z.namelist():
    print(z.read('Preview/PrvText.txt').decode('utf-8'))

# Full content from section XMLs
import xml.etree.ElementTree as ET
for name in sorted(z.namelist()):
    if name.startswith('Contents/section') and name.endswith('.xml'):
        root = ET.fromstring(z.read(name))
        for elem in root.iter():
            if elem.text and elem.text.strip():
                print(elem.text.strip())
"

Capabilities

FeatureHWPHWPX
Text extractionโœ… pyhwpโœ… ZIP+XML
Table detectionโš ๏ธ <ํ‘œ> markersโœ… XML tags
Image extractionโŒโœ… from BinData/
Metadataโœ… via hwp5โœ… from version.xml

Dependencies

  • pyhwp (pip install pyhwp) โ€” installed at /Users/mupeng/Library/Python/3.9/lib/python/site-packages/hwp5/
  • Python 3.9+ โ€” standard library zipfile, xml.etree.ElementTree

Limitations

  • HWP text extraction loses table structure (shows <ํ‘œ> placeholder)
  • HWPX Preview/PrvText.txt is truncated to ~1KB; use section XMLs for full content
  • Complex formatting (colors, fonts, page layout) not preserved in text mode
  • Encrypted/password-protected HWP files not supported

Usage Examples

Read a government application form

"์ด HWP ํŒŒ์ผ ์ฝ์–ด์ค˜: /path/to/์‹ ์ฒญ์„œ.hwp"
โ†’ Extract text โ†’ Analyze structure โ†’ Summarize sections

Compare two versions

"v1.hwp์™€ v2.hwp ์ฐจ์ด์  ๋ถ„์„ํ•ด์ค˜"
โ†’ Extract both โ†’ Diff content โ†’ Report changes

Fill in a template

"์ด ์–‘์‹์— ์šฐ๋ฆฌ ์‚ฌ์—… ๋‚ด์šฉ ์ฑ„์›Œ์ค˜"
โ†’ Read template โ†’ Identify blanks โ†’ Generate content suggestions

๐Ÿง ๋ฌดํŽญ์ด โ€” Making Korean documents accessible to AI agents

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Ai Automation Consulting

AI ่‡ชๅŠจๅŒ–ๅ’จ่ฏขๆœๅŠก - ๅธฎไฝ ็”จ AI ็œๆ—ถ็œ้’ฑใ€‚้€‚ๅˆ๏ผšไธญๅฐไผไธšใ€่‡ช็”ฑ่Œไธš่€…ใ€ๆƒณๆๆ•ˆ็š„ไบบใ€‚

Registry SourceRecently Updated
Automation

myskill

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express...

Registry SourceRecently Updated
Automation

GridClash

Battle in Grid Clash - join 8-agent grid battles. Fetch equipment data to choose the best weapon, armor, and tier. Use when user wants to participate in Grid...

Registry SourceRecently Updated