iMessage Database Query
Query the macOS iMessage SQLite database (~/Library/Messages/chat.db ) to retrieve conversation history, decode messages stored in binary format, and build sourced timelines with precise timestamps.
When to Use
-
Retrieving iMessage conversation history for a specific contact
-
Building sourced timelines with timestamps from text messages
-
Searching for keywords across all conversations
-
Debugging messages that appear empty but contain recoverable text
-
Extracting message content that iOS stored in binary attributedBody format
Prerequisites
-
macOS only — chat.db is a macOS-specific database
-
Full Disk Access — The terminal running Claude Code must have FDA granted in System Settings > Privacy & Security > Full Disk Access
-
Read-only — Never write to chat.db . Always use read-only SQLite access.
-
Optional: pip install pytypedstream — Enables tier 1 decoder (proper typedstream deserialization). Script works without it (falls through to pure-binary tiers 2/3).
Critical Knowledge - The text vs attributedBody Problem
IMPORTANT: Many iMessage messages have a NULL or empty text column but contain valid, recoverable text in the attributedBody column. This is NOT because they are voice messages — iOS stores dictated messages, messages with rich formatting, and some regular messages in attributedBody as an NSAttributedString binary blob.
How to detect
-- Messages with attributedBody but no text (these are NOT necessarily voice messages) SELECT COUNT(*) as hidden_messages FROM message m JOIN chat_message_join cmj ON m.ROWID = cmj.message_id JOIN chat c ON cmj.chat_id = c.ROWID WHERE c.chat_identifier = '<CHAT_IDENTIFIER>' AND (m.text IS NULL OR length(m.text) = 0) AND m.attributedBody IS NOT NULL AND length(m.attributedBody) > 100 AND m.associated_message_type = 0 AND m.cache_has_attachments = 0;
How to distinguish message types when text is NULL
cache_has_attachments
attributedBody length Likely type
0
100 bytes Dictated/rich text — recoverable via decode script
1 any Attachment (image, file, voice memo) — text may be in attributedBody too
0 < 50 bytes Tapback reaction or system message — usually noise
How to decode
Use the bundled decode script for reliable extraction (v4 — 3-tier decoder + native pitfall protections):
python3 <skill-path>/scripts/decode_attributed_body.py --chat "<CHAT_IDENTIFIER>" --limit 50
The decoder uses a 3-tier strategy:
-
Tier 1: pytypedstream Unarchiver — proper Apple typedstream deserialization (requires pip install pytypedstream )
-
Tier 2: Multi-format binary — 0x2B/0x4F/0x49 length-prefix parsing (zero deps, ported from macos-messages)
-
Tier 3: NSString marker + length-prefix — v2 legacy approach (zero deps, last resort)
Falls through tiers on failure. Works without pytypedstream installed (skips tier 1). See Cross-Repo Analysis for decoder comparison.
Date Formula
iMessage stores dates as nanoseconds since Apple epoch (2001-01-01 00:00:00 UTC).
datetime(m.date/1000000000 + 978307200, 'unixepoch', 'localtime') as timestamp
-
m.date / 1000000000 — Convert nanoseconds to seconds
-
- 978307200 — Add offset from Unix epoch (1970) to Apple epoch (2001)
-
'unixepoch' — Tell SQLite this is a Unix timestamp
-
'localtime' — Convert to local timezone (CRITICAL — omitting this gives UTC)
Quick Start Queries
- List all conversations
sqlite3 ~/Library/Messages/chat.db
"SELECT c.chat_identifier, c.display_name, COUNT(cmj.message_id) as msg_count
FROM chat c
JOIN chat_message_join cmj ON c.ROWID = cmj.chat_id
GROUP BY c.ROWID
ORDER BY msg_count DESC
LIMIT 20"
- Get conversation thread (text column only)
sqlite3 ~/Library/Messages/chat.db
"SELECT datetime(m.date/1000000000 + 978307200, 'unixepoch', 'localtime') as ts,
CASE WHEN m.is_from_me = 1 THEN 'Me' ELSE 'Them' END as sender,
m.text
FROM message m
JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
JOIN chat c ON cmj.chat_id = c.ROWID
WHERE c.chat_identifier = '<CHAT_IDENTIFIER>'
AND length(m.text) > 0
AND m.associated_message_type = 0
ORDER BY m.date DESC
LIMIT 50"
- Get ALL messages including attributedBody (use decode script)
python3 <skill-path>/scripts/decode_attributed_body.py
--chat "<CHAT_IDENTIFIER>"
--after "2026-01-01"
--limit 100
Filtering Noise
Tapback reactions
Tapback reactions (likes, loves, emphasis, etc.) are stored as separate message rows with associated_message_type != 0 . Always filter:
AND m.associated_message_type = 0
Shell escaping in zsh
The != operator can cause issues in zsh. Use positive assertions instead:
-- BAD (breaks in zsh) AND m.text != ''
-- GOOD (works everywhere) AND length(m.text) > 0
Using the Decode Script
The bundled decode_attributed_body.py handles all edge cases:
Basic usage - get last 50 messages from a contact
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890" --limit 50
Search for keyword
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890" --search "meeting"
Search with surrounding context (3 messages before and after each match)
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890" --search "meeting" --context 3
Date range
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890" --after "2026-01-01" --before "2026-02-01"
Only messages from the other party
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890" --sender them
Only messages from me
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890" --sender me
Export conversation to NDJSON for offline analysis
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890" --after "2026-02-01" --export thread.jsonl
Output format: timestamp|sender|text (pipe-delimited, one message per line)
Context Search (--context N )
When --search is combined with --context N , the script shows N messages before and after each match:
-
Matches are prefixed with [match]
-
Non-contiguous context groups are separated by --- context ---
-
Overlapping context windows are deduplicated
NDJSON Export (--export )
Exports messages to a NDJSON (.jsonl) file for offline analysis:
{ "ts": "2026-02-13 18:30:17", "sender": "them", "is_from_me": false, "text": "Message text here", "decoded": true, "type": "text", "edited": true, "service": "SMS", "effect": "slam", "reply_to": { "ts": "2026-02-13 18:00:00", "sender": "me", "text": "Original message..." } }
Fields edited , service , effect , reply_to are optional — only present when applicable. The type field is always present ("text" , "audio" , or "attachment" ).
Retracted messages are NEVER exported — they are deterministically excluded (see Native Protections below).
Export-first workflow (recommended for multi-query analysis):
Step 1: Export once
python3 <skill-path>/scripts/decode_attributed_body.py --chat "+1234567890"
--after "2026-02-01" --export thread.jsonl
Step 2: Analyze many times without re-querying SQLite
grep -i "keyword" thread.jsonl jq 'select(.text | test("reference"; "i"))' thread.jsonl jq 'select(.sender == "them")' thread.jsonl
Native Protections (v4)
The decode script natively handles these pitfalls — no manual SQL workarounds needed:
Protection Column Used Behavior
Retracted messages (Undo Send) date_retracted , date_edited
Excluded from output — content wiped by iOS, not admissible
Edited messages date_edited
Flagged with [edited] / "edited": true
Audio/voice messages is_audio_message
Identified as [audio message] — not misclassified as empty
Inline quotes (swipe-to-reply) thread_originator_guid
Resolved to quoted message text via GUID index
Attachments without text cache_has_attachments , attachment table Surfaced as [attachment: filename] instead of silently dropped
Message effects expressive_send_style_id
Decoded to human-readable names (slam, loud, gentle, invisible_ink)
Service type service
Flagged when SMS instead of iMessage
Tapback reactions associated_message_type
Filtered (only = 0 included)
Anti-Patterns to Avoid
-
Searching multiple chat identifiers blindly — Always run --stats first to confirm the right chat identifier has messages in the expected date range
-
Keyword search without context — Always use --context 5 (or more) with --search to understand conversational meaning around matches
-
Repeated narrow-window SQLite queries — Export the full date range to NDJSON first, then grep/jq the file for all subsequent analysis
Note: Replace <skill-path> with the actual installed skill path. To find it:
find ~/.claude -path "*/imessage-query/scripts/decode_attributed_body.py" 2>/dev/null
Reference Documentation
-
Schema Reference — Tables, columns, relationships
-
Query Patterns — Reusable SQL templates for common operations
-
Known Pitfalls — Every gotcha discovered and how to handle it
-
Cross-Repo Analysis — Comparison of 5 OSS decoder implementations and what we adopted
TodoWrite Task Templates
Template A - Retrieve Conversation Thread
- Identify chat_identifier for the contact (phone number or email)
- Run decode script with --chat and appropriate date range
- Review output for attributedBody-decoded messages (marked with [decoded])
- If searching for specific topic, add --search flag
- Format results as needed for the task
Template B - Debug Empty Messages
- Query messages where text IS NULL but attributedBody IS NOT NULL
- Check cache_has_attachments to distinguish voice/file from dictated text
- Run decode script to extract hidden text content
- Verify decoded content makes sense in conversation context
- Document any new decode patterns in known-pitfalls.md
Template C - Build Sourced Timeline
- Identify all relevant chat_identifiers
- Run decode script for each contact with date range
- Merge and sort by timestamp
- Format as sourced quotes with timestamps for documentation
- Verify no messages were missed (compare total count vs decoded count)
Template D - Export-First Deep Analysis
- Run --stats to confirm chat_identifier and date range
- Export full date range to NDJSON: --export thread.jsonl
- Use grep/jq on the NDJSON file for all keyword searches
- Use --search with --context 5 for contextual understanding of specific matches
- All subsequent analysis reads from the NDJSON file (no more SQLite queries)
Post-Change Checklist
After modifying this skill:
-
YAML frontmatter valid (name, description with triggers)
-
No private data (phone numbers, names, emails) in any file
-
All SQL uses parameterized placeholders
-
Decode script works with python3 (pytypedstream optional, tiers 2/3 are stdlib-only)
-
All reference links are relative paths
-
Append changes to evolution-log.md