hwp-reader

# ๐Ÿง HWP Reader โ€” Read & Analyze Korean HWP/HWPX Documents

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "hwp-reader" with this command: npx skills add mupengi-bot/hwp-reader

๐Ÿง HWP Reader โ€” Read & Analyze Korean HWP/HWPX Documents

Author: ๋ฌดํŽญ์ด ๐Ÿง | v1.0.0

Description

Read and extract text content from Korean HWP (ํ•œ๊ธ€) and HWPX files. Supports both legacy HWP format (via pyhwp) and modern HWPX format (ZIP-based XML).

When to Use

  • User asks to read/analyze a .hwp or .hwpx file
  • Government support application forms (์ •๋ถ€์ง€์›์‚ฌ์—… ์‹ ์ฒญ์„œ)
  • Any Korean document in Hangul Word Processor format

How It Works

HWP Files (Legacy Format)

python3 -c "
from hwp5.hwp5txt import main
import sys
sys.argv = ['hwp5txt', 'FILE_PATH']
main()
"

HWPX Files (Modern XML Format)

python3 -c "
import zipfile
z = zipfile.ZipFile('FILE_PATH')

# Quick preview text
if 'Preview/PrvText.txt' in z.namelist():
    print(z.read('Preview/PrvText.txt').decode('utf-8'))

# Full content from section XMLs
import xml.etree.ElementTree as ET
for name in sorted(z.namelist()):
    if name.startswith('Contents/section') and name.endswith('.xml'):
        root = ET.fromstring(z.read(name))
        for elem in root.iter():
            if elem.text and elem.text.strip():
                print(elem.text.strip())
"

Capabilities

FeatureHWPHWPX
Text extractionโœ… pyhwpโœ… ZIP+XML
Table detectionโš ๏ธ <ํ‘œ> markersโœ… XML tags
Image extractionโŒโœ… from BinData/
Metadataโœ… via hwp5โœ… from version.xml

Dependencies

  • pyhwp (pip install pyhwp) โ€” installed at /Users/mupeng/Library/Python/3.9/lib/python/site-packages/hwp5/
  • Python 3.9+ โ€” standard library zipfile, xml.etree.ElementTree

Limitations

  • HWP text extraction loses table structure (shows <ํ‘œ> placeholder)
  • HWPX Preview/PrvText.txt is truncated to ~1KB; use section XMLs for full content
  • Complex formatting (colors, fonts, page layout) not preserved in text mode
  • Encrypted/password-protected HWP files not supported

Usage Examples

Read a government application form

"์ด HWP ํŒŒ์ผ ์ฝ์–ด์ค˜: /path/to/์‹ ์ฒญ์„œ.hwp"
โ†’ Extract text โ†’ Analyze structure โ†’ Summarize sections

Compare two versions

"v1.hwp์™€ v2.hwp ์ฐจ์ด์  ๋ถ„์„ํ•ด์ค˜"
โ†’ Extract both โ†’ Diff content โ†’ Report changes

Fill in a template

"์ด ์–‘์‹์— ์šฐ๋ฆฌ ์‚ฌ์—… ๋‚ด์šฉ ์ฑ„์›Œ์ค˜"
โ†’ Read template โ†’ Identify blanks โ†’ Generate content suggestions

๐Ÿง ๋ฌดํŽญ์ด โ€” Making Korean documents accessible to AI agents

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Expertai

Expert.ai integration. Manage data, records, and automate workflows. Use when the user wants to interact with Expert.ai data.

Registry SourceRecently Updated
Automation

Exabeam

Exabeam integration. Manage data, records, and automate workflows. Use when the user wants to interact with Exabeam data.

Registry SourceRecently Updated
Automation

Encore

Encore integration. Manage data, records, and automate workflows. Use when the user wants to interact with Encore data.

Registry SourceRecently Updated
Automation

Email Excel Transfer

Automatyzuje workflow pobierania danych z email i wstawiania ich do arkuszy kalkulacyjnych. Uลผyj gdy uลผytkownik chce przenieล›ฤ‡ informacje z poczty do Excela.

Registry SourceRecently Updated