xiaohongshu-collector

Work on Xiaohongshu post/comment collection, cookie handling, refresh flows, and browser plugin integration in the forbidden_company repo.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "xiaohongshu-collector" with this command: npx skills add pengluday/xiaohongshu-collector

Xiaohongshu Collector

Overview

Use this skill when working on Xiaohongshu collection in forbidden_company, especially for post bodies, comment pagination, cookie updates, single-URL refreshes, or browser-plugin integration.

What To Use

Prefer the existing repo implementation instead of inventing a new flow:

  • scripts/collect_xiaohongshu.py
  • scripts/admin_server.py
  • scripts/run_xiaohongshu_collection.sh
  • browser-extension/xhs-collector/
  • docs/xiaohongshu-collector.md
  • docs/xhs-plugin-api.md

Core Rules

  • Keep cookies private. Never repeat them in final output.
  • comment_limit=0 means collect all available comments.
  • Comment collection must paginate.
  • If the direct comment API returns a login/account error, use the browser-rendered fallback.
  • Do not rely on Firecrawl for comment pagination.

Workflow

  1. Confirm whether the task is batch collection or single-URL refresh.
  2. Load the saved cookie from data/xiaohongshu-cookie.txt unless a newer cookie is provided.
  3. Run or update scripts/collect_xiaohongshu.py with the requested URL(s), --db, --refresh-url, and --comment-limit 0 when full comments are needed.
  4. For browser plugin work, wire the popup/background scripts to the local backend endpoints in scripts/admin_server.py.
  5. Verify that post rows, comment rows, and exported artifacts are written correctly.

Endpoint Map

Use these backend endpoints when integrating the browser plugin:

  • GET/POST /api/xhs-cookie
  • GET /api/xhs-plugin/status
  • POST /api/xhs-plugin/collect
  • POST /api/xhs-plugin/refresh

Validation Notes

  • Refresh mode must delete the old note rows before writing the new ones.
  • The plugin should expose downloadable CSV and JSON artifacts.
  • When debugging, check whether the failure is cookie-related, pagination-related, or page-structure related.

Safety Notes

  • Do not propose or implement shared-server mass scraping.
  • Keep the browser/plugin model user-driven and local-first.
  • Preserve source URLs and timestamps for traceability.

Reference

See collector-workflow.md for operational details.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

通义晓蜜 - 智能外呼

触发阿里云晓蜜外呼机器人任务,自动批量拨打电话。适用于批量外呼、客户回访、满意度调查、简历筛查约面试等场景。可从前置工具或节点获取外呼名单。

Registry SourceRecently Updated
General

Letterboxd Watchlist

Scrape a public Letterboxd user's watchlist into a CSV/JSONL list of titles and film URLs without logging in. Use when a user asks to export, scrape, or mirror a Letterboxd watchlist, or to build watch-next queues.

Registry SourceRecently Updated
General

Seedance Video Generation

Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.

Registry SourceRecently Updated
4.2K17jackycser
General

Universal Skills Manager

The master coordinator for AI skills. Discovers skills from multiple sources (SkillsMP.com, SkillHub, and ClawHub), manages installation, and synchronization...

Registry SourceRecently Updated