Email IMAP Full Fetch
Core Goal
- Fetch one target email by stable message reference from IMAP.
- Enforce lookup order:
HEADER Message-Idexact match first, thenuidfallback. - Download full raw MIME via
BODY.PEEK[]. - Parse and return headers, full text body, html body, and attachment metadata.
- Save
.emland attachment files to disk with filename safety and idempotent indexing.
Standard Flow
- Input must include
message_id_normfrom stage-1 routing output (mail_ref.message_id_norm). - Use
fetch --message-id "<message_id_norm or raw Message-Id>"as the default path. - Use
fetch --uid "<uid>"only when no usable message-id is available. - Keep mailbox selection consistent with stage-1 (
--mailboxorIMAP_MAILBOX). - Read JSON output and continue downstream processing with returned
mail_ref.
Commands
Fetch by Message-Id (preferred):
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>"
Fetch by UID (fallback only):
python3 scripts/imap_full_fetch.py fetch --uid "123456"
Use both when needed (message-id lookup first, uid fallback second):
python3 scripts/imap_full_fetch.py fetch --message-id "<caa123@example.com>" --uid "123456"
Output Contract
- Output is a single JSON object.
- Required top-level fields:
mail_refheaderstext_plaintext_htmlattachmentssaved_eml_path
mail_refcontains:account,mailbox,uid,message_id_raw,message_id_norm,date
attachments[]contains per-file metadata and persistence result:filename,content_type,bytes,disposition,saved_path,skipped_reason
Storage And Idempotency
saved_eml_pathpoints to local.emlfile saved fromBODY.PEEK[].- Attachments are saved without returning attachment binary content in JSON.
- Filenames are sanitized to remove path separators and unsafe characters.
- Duplicate attachment names are deduped with content-hash suffix.
- Repeated requests are idempotent by
message_id_normindex and return existing persisted JSON record directly.
Parameters
--message-id: primary lookup key.--uid: fallback lookup key.--mailbox: mailbox to query (defaultIMAP_MAILBOXorINBOX).--save-eml-dir: target dir for.emlfiles (envIMAP_FULL_SAVE_EML_DIR).--index-dir: target dir for idempotency index JSON files (envIMAP_FULL_INDEX_DIR, default<save-eml-dir>/.index).--save-attachments-dir: target dir for attachments (envIMAP_FULL_SAVE_ATTACHMENTS_DIR).--max-attachment-bytes: max saved attachment size (envIMAP_FULL_MAX_ATTACHMENT_BYTES).--allow-ext: allowed attachment extensions, comma-separated (envIMAP_FULL_ALLOW_EXT).--connect-timeout: IMAP connect timeout seconds (default fromIMAP_CONNECT_TIMEOUT).
Required Environment
IMAP_HOSTIMAP_USERNAMEIMAP_PASSWORD
Optional account defaults:
IMAP_NAMEIMAP_PORTIMAP_SSLIMAP_MAILBOX
Scripts
scripts/imap_full_fetch.py