Slack thread export
Use this skill when Slack data must be exported from a real logged-in browser tab instead of a bot token or admin API.
Scope and non-goals
This skill is intentionally optimized for logged-in Slack Web exports through an attached Chrome Browser Relay tab.
It is meant to be good at:
- exporting thread replies for one user across selected channels
- building CSV/JSONL archives from Slack search
- narrowing results by channel/date and optionally reducing non-work noise
- surviving partial failures with resumable raw output and failed-channel retries
It is not meant to be:
- a generic Slack admin/export replacement
- a bot-token or SCIM-based export tool
- a guaranteed complete legal/compliance archive
- a universal classifier for "work-related" messages
Workflow
- Ensure the user has attached the Slack tab with Browser Relay and use
profile="chrome". - Use the browser tool or
openclaw browser --browser-profile chrometo confirm the attached tab is the correct Slack workspace. - Read
localStorage.localConfig_v2from the Slack tab to get the active team metadata and xoxc token used by the web client. - Query Slack's internal
search.messagesendpoint from inside the page context usingfetch('/api/search.messages', ...)so the logged-in browser session supplies the right cookies/session context. - Export in small batches:
- Prefer channel-by-channel queries over one huge query.
- Prefer page-by-page loops (
count=100, incrementpage). - Back off on
ratelimitedwith sleep/retry. - Stop when
count == 0or< 100for a page.
- Exclude unwanted records before writing files:
- Drop IM/MPIM results.
- Keep only records whose permalink includes
thread_ts=when the user wants thread replies. - Apply user-requested business filters (channel whitelist, keyword filter, date range, etc.).
- Save both:
- raw JSONL for audit/debugging
- cleaned CSV for the user's actual deliverable
How it works
This skill does not scrape visible Slack DOM rows as the primary path. Instead it piggybacks on the same internal search flow used by Slack Web:
- Attach to the user's already logged-in Slack browser tab.
- Read the active workspace metadata from
localStorage.localConfig_v2. - Reuse the web client's xoxc token inside the page context only.
- Call
fetch('/api/search.messages', ...)from the attached Slack page so the request inherits the browser session cookies and client context. - Page through Slack search results and transform them into exportable rows.
This is more reliable than host-side requests because Slack search often expects both the token and the live logged-in browser session.
Operational numbers and limits
Use these defaults unless there is a clear reason not to:
count=100persearch.messagesrequest- This is the practical max batch size used by the web client path here.
sleep-seconds=0.5between normal page requests- On
ratelimited, sleep at least 3-5 seconds before retrying the same page - Start with
max-pages=60per channel as a practical ceiling - Prefer 1 channel at a time or a short whitelist over a whole-workspace sweep
Practical guidance:
- Usually safe: 1 user + 1 channel + 1-20 pages
- Still reasonable: 1 user + 3-10 channels + paging with 0.5s delay
- Risky: one giant all-history query across the whole workspace
- Very risky: one huge page-context evaluate that loops many channels and many pages in one call
Why this matters:
- Slack will return
ratelimitedif requests come too quickly or the query is too broad. - Browser evaluation can time out if too much work is packed into one evaluate call.
- The safest shape is many short calls, not one giant call.
What was actually validated
These are not theoretical claims only; the workflow was exercised against a real logged-in Slack tab.
Small-range validation
Test shape:
- channels:
tech-team,moonlight,dev-part - range:
after=2026-03-01,before=2026-03-15 - mode:
strict
Observed result:
tech-team: 9 rowsmoonlight: 29 rowsdev-part: 0 rows- total raw: 38
- unique raw: 38
- failed channels: 0
- total requests: 3
- duration: ~5.05s
This validated:
--before--after--channel-file--summary-json--failed-channels-out--resume-from-jsonl--preflightstrictmode
Heuristic filtering validation
To prove that heuristic mode does more than strict, a second test included edge channels:
tech-teammoonlightrandommusicai서비스활용-lounge
Observed result:
strict: 50 rows keptheuristic: 40 rows kept- removed by heuristic: 10 rows
random: 9music: 1
The removed examples were casual/non-work items such as chatty thread replies, celebration posts, and a YouTube link. This confirmed that heuristic mode can reduce obvious non-work noise when mixed channels are included.
Stress validation
A broader run used these channels:
moonlightdev-parttech-teambobcms
Stress settings:
max-pages=20sleep-seconds=0.05- mode:
heuristic
Observed result:
- total raw: 4580
- filtered keep: 2763
- total requests: 48
- duration: ~85.52s
- ratelimit retries: 0
- failed channels: 0
Interpretation:
- The channel-by-channel/page-by-page design was stable under a moderately aggressive run.
- This does not guarantee immunity from rate limiting in all workspaces.
- It does show that the design is more robust than one giant search/evaluate call.
Important constraints
- Do not assume the Slack API is callable directly from host Python with only the xoxc token. In practice the browser session context matters; call from the page context first.
- Do not try one giant all-history query first; Slack rate-limits and long browser evaluations will time out.
- Do not claim the export is complete if channel-level calls failed. Report partial coverage clearly.
- Treat the xoxc token and any cookies as sensitive. Do not echo them back to the user or store them in exported files.
- Do not silently widen "work-related" to mean everything. If the user wants "all work-related threads," state clearly that this is a heuristic unless they also provide a channel whitelist, date range, or keyword rule.
- Do not present heuristic filtering as moderation-, compliance-, HR-, or legal-grade classification.
Publish-ready status
At this stage, the skill is suitable for beta/publication-candidate use when these caveats are respected:
- Requires an attached, logged-in Slack web tab via Browser Relay
- Relies on Slack web client behavior that may change over time
- Uses heuristic filtering for "work-like" exports unless
strictorrawmode is chosen - Is strongest for operator-led exports, not unattended compliance archiving
If publishing publicly, describe it as an advanced/browser-relay Slack export skill rather than a universal Slack export solution.
Failure semantics
Handle failures explicitly:
invalid_auth- Meaning: wrong tab, logged-out tab, or a host-side request that lacks the browser session context.
- Fix: confirm the attached Slack tab is logged in and re-run from page context.
ratelimited- Meaning: too many requests too quickly or query scope too broad.
- Fix: sleep and retry the same page; reduce channel count; add
after:date filters.
- browser evaluate timeout
- Meaning: too much work packed into one evaluate call.
- Fix: split by channel and page; keep each evaluate short.
- partial export
- Meaning: some channels completed and some failed.
- Fix: save raw progress, report completed channels, and retry only failed channels.
Recommended query patterns
Use these building blocks:
- User filter:
from:<@USER_ID> - Thread-only:
is:thread - Channel filter:
in:#channel-name - Time filter if needed:
after:YYYY-MM-DD
Typical query:
from:<@UXXXX> is:thread in:#tech-team after:2026-01-01
Preferred output columns
Use these columns unless the user asks otherwise:
datetime_utcchannel_namechannel_idthread_tsusernametextpermalinksource_query_channel
Scripts
scripts/export_slack_threads.py— channel-by-channel Slack thread exporter driven through the attached Chrome relay tab.scripts/retry_failed_channels.py— rerun export using thefailed-channels-outfile from a previous run.
Run scripts with explicit parameters instead of editing code in place.
Example:
python3 skills/slack-thread-export/scripts/export_slack_threads.py \
--target-id <attached-tab-id> \
--user-id U04SFF458BC \
--team-id T01AY79ELUC \
--channels tech-team moonlight dev-part \
--after 2026-01-01 \
--before 2026-03-15 \
--mode heuristic \
--failed-channels-out exports/slack/failed.txt \
--summary-json exports/slack/summary.json \
--out-csv exports/slack/out.csv \
--out-jsonl exports/slack/out.jsonl
If coverage is too broad
Narrow the export before retrying:
- Use a shorter date range.
- Use channel whitelist first.
- Export raw first, then run a second-pass filter for “work-like” heuristics.
- Sample channel distribution first, then choose high-signal channels before doing the full export.
Retry workflow
When a run is partial:
- Keep the raw JSONL from the first run.
- Save failed channels with
--failed-channels-out. - Retry only those channels using
scripts/retry_failed_channels.py. - If needed, pass
--resume-from-jsonlso the retry output starts from the previous raw capture instead of rebuilding from zero.
Retry example:
python3 skills/slack-thread-export/scripts/retry_failed_channels.py \
--target-id <attached-tab-id> \
--user-id U04SFF458BC \
--team-id T01AY79ELUC \
--failed-channels-file exports/slack/failed.txt \
--resume-from-jsonl exports/slack/out.jsonl \
--out-csv exports/slack/retry.csv \
--out-jsonl exports/slack/retry.jsonl \
--summary-json exports/slack/retry-summary.json \
--failed-channels-out exports/slack/retry-failed.txt
Supported options
The script now supports these operational controls:
--channel-file— read channel names from a file--after/--before— date windowing--resume-from-jsonl— continue from a previous raw export--failed-channels-out— save failed channels for targeted retry--summary-json— save structured run summary--preflight— sample page-1 volume per channel before full export--mode strict|raw|heuristicstrict: export only the requested channel/date slice, no work-like heuristic filteringraw: keep everything collected after the query constraints; useful for audit/review datasetsheuristic: apply work-related channel/text filtering after collection
--keyword/--channel-hint— extend heuristic filtering for a specific workspace
Filtering guidance
For “work-related” exports, prefer this order:
- Channel whitelist / channel-name hints
- Explicit date range
- Keyword heuristic on message text
- Manual review for edge channels
If the user says "all work-related threads", explain that this is heuristic unless they provide a strict channel whitelist or date range.
Mode selection guide
Use strict when:
- the user already knows which channels matter
- the goal is faithful export within explicit channel/date constraints
- the user does not want extra classifier behavior
Use raw when:
- the user wants a review dataset first
- a human or later pipeline will do the filtering
- auditability matters more than noise reduction
Use heuristic when:
- the user asks for "work-related" rather than a strict channel list
- there are mixed-signal channels in scope
- reducing obvious non-work chatter is valuable
If unsure, start with strict or raw, then derive a second-pass heuristic export for comparison.