red-crawler-ops
Use this skill when you need to operate the installed red-crawler CLI from an OpenClaw workflow. It is the portable wrapper for the crawler runtime, not a separate crawler implementation.
When To Use
Use red-crawler-ops for:
- preparing a local working directory for
red-crawler - saving a login session into Playwright storage state
- crawling a seed Xiaohongshu profile
- running nightly collection against a workspace database
- exporting a weekly report
- listing contactable creators from the SQLite database
red-crawler CLI Commands
All crawling tasks must use the native red-crawler CLI commands:
1. crawl-seed
Crawl a specific Xiaohongshu user profile and extract contact information.
red-crawler crawl-seed \
--seed-url "https://www.xiaohongshu.com/user/profile/USER_ID" \
--storage-state "./state.json" \
--max-accounts 5 \
--max-depth 2 \
--db-path "./data/red_crawler.db" \
--output-dir "./output"
Parameters:
--seed-url(required): Target user profile URL--storage-state(required): Path to Playwright storage state file--max-accounts: Maximum accounts to crawl (default: 20)--max-depth: Crawl depth for related accounts (default: 2)--include-note-recommendations: Include note recommendations--safe-mode: Enable safe mode--cache-dir: Cache directory path--cache-ttl-days: Cache TTL in days (default: 7)--db-path: SQLite database path (default: ./data/red_crawler.db)--output-dir: Output directory (default: ./output)
Outputs:
accounts.csv: Crawled account informationcontact_leads.csv: Extracted contact information (emails, etc.)run_report.json: Execution report
2. login
Interactive login to save browser session.
red-crawler login --save-state "./state.json"
Parameters:
--save-state(required): Path to save storage state--login-url: Login page URL (default: https://www.xiaohongshu.com)
3. login-qr-start / login-qr-finish
QR code-based login for headless environments.
# Start QR login (generates QR code)
red-crawler login-qr-start \
--save-state "./state.json" \
--qr-path "./login-qr.png" \
--session-path "./login-session.json" \
--timeout 180
# Finish QR login after user scans
red-crawler login-qr-finish \
--save-state "./state.json" \
--session-path "./login-session.json"
4. collect-nightly
Run scheduled nightly data collection.
red-crawler collect-nightly \
--storage-state "./state.json" \
--db-path "./data/red_crawler.db" \
--report-dir "./reports" \
--crawl-budget 30 \
--search-term-limit 4
Parameters:
--storage-state(required): Path to storage state file--db-path: Database path (default: ./data/red_crawler.db)--report-dir: Report directory (default: ./reports)--cache-dir: Cache directory--cache-ttl-days: Cache TTL (default: 7)--crawl-budget: Crawl budget (default: 30)--search-term-limit: Search term limit (default: 4)--startup-jitter-minutes: Startup jitter--slot-name: Slot name for scheduling
5. report-weekly
Export weekly reports from database.
red-crawler report-weekly \
--db-path "./data/red_crawler.db" \
--report-dir "./reports" \
--days 7
Parameters:
--db-path: Database path (default: ./data/red_crawler.db)--report-dir: Report directory (default: ./reports)--days: Report period in days (default: 7)
Outputs:
weekly-growth-report.jsoncontactable_creators.csv
6. list-contactable
List contactable creators from database.
red-crawler list-contactable \
--db-path "./data/red_crawler.db" \
--lead-type "email" \
--creator-segment "creator" \
--min-relevance-score 0.5 \
--limit 20 \
--format csv
Parameters:
--db-path: Database path (default: ./data/red_crawler.db)--lead-type: Lead type filter (default: email)--creator-segment: Creator segment filter (default: creator)--min-relevance-score: Minimum relevance score (default: 0.0)--limit: Result limit (default: 20)--format: Output format - table or csv (default: table)
7. open
Open Xiaohongshu in browser with saved session.
red-crawler open --storage-state "./state.json"
Supported Actions
bootstraplogincrawl_seedcollect_nightlyreport_weeklylist_contactable
Example Prompts
- "帮我准备当前小红书爬虫项目的本地环境" (Automatically maps to
bootstrapfor an existing workspace) - "我需要登录爬虫" / "我要登录小红书" (Automatically maps to
loginto fetch/refresh the Playwright session state) - "开始执行每日夜间数据采集" / "运行自动收集任务" (Automatically maps to
collect_nightlyto continue crawling based on the database queue) - "帮我生成一份本周的爬虫数据周报" (Automatically maps to
report_weeklypointing to the workspace's DB)
Crawling New Data vs Querying Database:
- "帮我从这个博主去爬10个相关的美妆博主: https://www.xiaohongshu.com/..." (Crawls NEW data: Automatically maps to
crawl_seedwithseed_url, settingmax_accountsto 10. Note: crawling new data requires a seed URL.) - "帮我从数据库/已爬取的数据中找出10个美妆/游戏/科技博主的联系方式" (Queries EXISTING DB: Automatically sets
actiontolist_contactable,limitto 10, andcreator_segmentto "美妆" to filter the local SQLite database)
(Also understands technical prompt variations:)
- "Bootstrap this workspace with
install_browser: trueafter I have installed the CLI." - "Crawl this seed profile with a depth of 2 and write outputs into
output/." - "Export this week's report and return the generated artifacts."
Environment Setup
Windows (WSL2)
On Windows, red-crawler runs inside WSL2. You need:
- WSL2 with Ubuntu (20.04 or 22.04 recommended)
- WSLg (built-in graphics support for WSL2) - for browser GUI
- Dependencies:
sudo apt-get update sudo apt-get install -y git python3 python3-pip - red-crawler CLI, installed from the published package.
Known Issues & Fixes:
-
DISPLAY not set (WSLg)
- Error:
Missing X server or $DISPLAY - Fix: Export DISPLAY before running:
export DISPLAY=:0
- Error:
-
Headless vs Headed browser
logincommand requires headed browser (GUI)crawl-seedand other commands also require headed browser on WSL- Always set
DISPLAY=:0before running any command with browser
Linux (Native)
- Dependencies:
sudo apt-get update sudo apt-get install -y git python3 python3-pip - red-crawler CLI, installed from the published package.
- X Server (for headed browser):
sudo apt-get install -y xvfb export DISPLAY=:99 Xvfb :99 -screen 0 1024x768x16 &
macOS
- red-crawler CLI:
uv tool install red-crawler==0.1.2 - Playwright browser runtime: run
bootstrapwithinstall_browser: true.
Prerequisites
- This skill never clones a repository. Install
red-crawleras a package, then pointworkspace_pathat a local working directory. - Set
require_local_checkout: trueonly when you intentionally want to run from a source checkout. uvis only required whensync_dependencies: trueis used for a local source checkout.bootstrapdoes not create a login session. Useloginexplicitly.logincreates the Playwright storage state explicitly.crawl_seedandcollect_nightlyrequire an authenticated Playwright storage state file.report_weeklyandlist_contactablerun from the database and do not require storage state.- The workspace must contain
pyproject.toml.
Safety Limits
- Do not point this skill at a directory you do not control.
- Do not create login sessions silently; call
loginonly when the user asks to authenticate. - Keep the Playwright storage state file local and out of commits, logs, and shared artifacts.
- Do not point it at production data or unknown databases.
- Do not assume a browser session exists; create
state.jsonwithloginfirst. - Do not hard-code machine-specific paths in prompts or config.
- Prefer relative, workspace-scoped paths for outputs and reports.
Input Shape
Provide an object with action plus optional fields used by the selected action. Common fields include:
workspace_pathrequire_local_checkoutrunner_commandstorage_statedb_pathreport_diroutput_dircache_dir
Action-specific fields include:
sync_dependenciesinstall_browserseed_urllogin_urlmax_accountsmax_depthinclude_note_recommendationssafe_modecache_ttl_dayscrawl_budgetsearch_term_limitstartup_jitter_minutesslot_namedayslead_typecreator_segmentmin_relevance_scorelimitformat
Output Shape
Successful runs return:
statusactioncommandsummaryartifactsmetricsnext_stepstdoutstderr
Error runs return:
statusactionerror_typemessagesuggested_fixaction,command,stdout, andstderrfor execution-time failures- Early validation or configuration failures may omit
action,command,stdout, andstderr