Email Summarizer + Contact Profiler
Three independent stages — run any subset depending on what you need:
[Data Source] [Analysis] [Delivery]
fetch_imap.py ──→ ──→
parse_file.py ──→ build_report.py ──→ send_report.py
→ .html + .xlsx
Each stage reads from / writes to files (or stdin/stdout), so they compose freely.
Installation
# Python dependencies
pip install -r requirements.txt
# Node.js dependency (only for native .pst parsing)
cd scripts && npm install && cd ..
Script Reference
Stage 1 — Data Source
fetch_imap.py — Fetch emails from a live IMAP mailbox
export EMAIL_USER=you@163.com
export EMAIL_PASS=your-app-password
# Fetch last 7 days (inbox only) → emails.json
python3 scripts/fetch_imap.py --days 7 --output emails.json
# Fetch a date range, inbox + sent
python3 scripts/fetch_imap.py --since 2026-03-01 --until 2026-03-28 \
--with-sent --preset 163 --output emails.json
# Pipe directly to build_report.py (no intermediate file)
python3 scripts/fetch_imap.py --days 30 | python3 scripts/build_report.py --output report
| Flag | Default | Description |
|---|---|---|
--days N | 7 | Fetch last N days. Ignored if --since is set. |
--since DATE | — | Start date inclusive (YYYY-MM-DD) |
--until DATE | — | End date exclusive (YYYY-MM-DD, default: today) |
--max N | 200 | Max emails per folder |
--folder NAME | INBOX | Inbox folder name |
--preset NAME | 163 | Mailbox preset: 163 | qq | exmail | gmail | outlook |
--with-sent | off | Also fetch Sent folder (recommended for relationship analysis) |
--output FILE | stdout | Write JSON to file; prints to stdout if omitted |
Supported presets:
| Preset | IMAP Server | Sent folder |
|---|---|---|
163 | imap.163.com:993 | 已发送 |
qq | imap.qq.com:993 | Sent Messages |
exmail | imap.exmail.qq.com:993 | Sent Messages |
gmail | imap.gmail.com:993 | [Gmail]/Sent Mail |
outlook | outlook.office365.com:993 | Sent Items |
parse_file.py — Parse a local email export
# Parse a .pst file (native engine, no system install needed)
python3 scripts/parse_file.py --pst ~/Downloads/archive.pst --output emails.json
# Parse a .mbox file (inbox + sent)
python3 scripts/parse_file.py --mbox Inbox.mbox --sent-mbox Sent.mbox --output emails.json
# Parse a folder of .msg files, filter by date
python3 scripts/parse_file.py --msg-dir ./exported/ --since 2026-01-01 --output emails.json
# Pipe to build_report.py
python3 scripts/parse_file.py --pst archive.pst | python3 scripts/build_report.py --output report
| Flag | Default | Description |
|---|---|---|
--pst FILE | — | Outlook .pst archive |
--mbox FILE | — | Unix mbox file |
--msg-dir DIR | — | Folder of .msg files |
--sent-mbox FILE | — | Additional sent-items mbox (with --mbox) |
--pst-engine | auto | auto | native | readpst |
--days N | all | Only include last N days |
--since DATE | — | Start date inclusive (YYYY-MM-DD) |
--until DATE | — | End date exclusive (YYYY-MM-DD) |
--max N | 500 | Max emails to load |
--output FILE | stdout | Write JSON to file; prints to stdout if omitted |
PST engines:
| Engine | Flag | Requires |
|---|---|---|
| Native (default) | --pst-engine native | cd scripts && npm install |
| Readpst | --pst-engine readpst | apt install pst-utils or brew install libpst |
| Auto | --pst-engine auto | Tries native first, falls back to readpst |
Stage 2 — Analysis
build_report.py — Analyse emails → HTML + Excel report
# From a file
python3 scripts/build_report.py --input emails.json --output report
# Specify owner explicitly (when analysing someone else's PST)
python3 scripts/build_report.py --input emails.json --output report \
--owner xiang-xiang.hu@connect.polyu.hk
# From stdin (piped from Stage 1)
python3 scripts/fetch_imap.py --days 30 | python3 scripts/build_report.py --output report
| Flag | Default | Description |
|---|---|---|
--input FILE | stdin | Path to emails JSON from Stage 1 |
--output PREFIX | contact_report | Output path prefix. Writes <prefix>.html and <prefix>.xlsx |
--owner EMAIL | auto-inferred | Mailbox owner address. Auto-detected if omitted. |
Output files:
<prefix>.html— self-contained HTML report (open in browser or attach to email)<prefix>.xlsx— Excel spreadsheet with the same data
Report columns: # / Preferred Name / Email / Company / Position / Subject Summary / Source / Emails (Recv/Sent)
Stage 3 — Delivery
send_report.py — Send report files via SMTP
export EMAIL_USER=you@163.com
export EMAIL_PASS=your-app-password
# Send HTML + Excel to a recipient
python3 scripts/send_report.py \
--html report.html --xlsx report.xlsx --to friend@example.com
# Send to yourself (EMAIL_USER)
python3 scripts/send_report.py --html report.html --xlsx report.xlsx
# HTML only (no attachment)
python3 scripts/send_report.py --html report.html --to friend@example.com
# Custom subject
python3 scripts/send_report.py --html report.html --subject "March Contact Report"
| Flag | Default | Description |
|---|---|---|
--html FILE | required | HTML file to use as email body |
--xlsx FILE | — | Excel file to attach (optional) |
--to ADDR | EMAIL_USER | Recipient address |
--subject STR | auto | Email subject (auto-generated if omitted) |
--preset NAME | 163 | SMTP preset: 163 | qq | exmail | gmail | outlook |
Private library modules (not standalone scripts)
| File | Provides |
|---|---|
_core.py | decode_header, parse_addr, get_domain, strip_html, html_esc, get_body |
_analyze.py | infer_owner, build_contacts, domain/company/position/name inference |
_render.py | build_report_html, build_excel, SMTP_MAP |
Complete workflow examples
A — IMAP mailbox → report → send to self
export EMAIL_USER=you@163.com
export EMAIL_PASS=your-app-password
python3 scripts/fetch_imap.py --days 30 --with-sent --output /tmp/emails.json
python3 scripts/build_report.py --input /tmp/emails.json --output /tmp/report
python3 scripts/send_report.py --html /tmp/report.html --xlsx /tmp/report.xlsx --preset 163
B — PST file → report → send to someone
python3 scripts/parse_file.py --pst ~/Downloads/archive.pst --output /tmp/emails.json
python3 scripts/build_report.py --input /tmp/emails.json --output /tmp/report
EMAIL_USER=sender@163.com EMAIL_PASS=xxx \
python3 scripts/send_report.py --html /tmp/report.html --xlsx /tmp/report.xlsx \
--to recipient@example.com
C — One-liner (pipe, no intermediate files)
python3 scripts/parse_file.py --pst archive.pst \
| python3 scripts/build_report.py --output /tmp/report
D — Report only, no email (open locally)
python3 scripts/parse_file.py --mbox Inbox.mbox --output /tmp/emails.json
python3 scripts/build_report.py --input /tmp/emails.json --output /tmp/report
# Open /tmp/report.html in a browser
AI analysis templates
After loading the email JSON into the AI context, use the following templates:
Part A: 4-Dimension Email Summary
🔥 Part 1 — Important & Action Items
🚨 [URGENT] Subject — Sender — Date | Summary | Action | Deadline
⚡ [IMPORTANT] Subject — Sender — Date | Summary | Action
📌 [NOTE] Subject — Sender — Date | Summary
📊 Part 2 — Grouped by Sender / Topic
✅ Part 3 — To-Do List
📅 Part 4 — Timeline (YYYY-MM-DD Sender → Subject: summary)
Part B: Contact Profile Analysis
Sort by total interactions. For each contact:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
👤 Rank #N | Name <email> Total: N (Recv: N / Sent: N)
🧑 Gender M/F/Unknown Confidence: H/M/L Basis: …
💼 Role … Basis: domain / signature / keywords
🔗 Relationship Colleague / Client / Institution / Stranger
Direction Mutual / Owner-initiated / Contact-initiated
📝 Topics • subject 1 • subject 2
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Heat scale: 🔥 Heavy (≥10) · ⚡ Active (5–9) · 💬 Moderate (2–4) · 🌙 Light (1)
Notes
- Credentials are passed via environment variables — never hardcoded
- IMAP connections use readonly mode — emails are never modified or deleted
- Body text is truncated to 2000 characters per email
fetch_imap.pysends an RFC 2971 ID command required by 163/188 servers; harmless on others