media-crawler

基于 MediaCrawler 的多平台公开信息采集工具,支持安装、命令行运行、WebUI、结果定位与常用任务模板。

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "media-crawler" with this command: npx skills add excalibur9527/mediacrawler-skill

MediaCrawler

基于 MediaCrawler 的多平台公开信息采集工具。

支持平台

  • 小红书(xhs)
  • 抖音(dy)
  • 快手(ks)
  • B站(bili)
  • 微博(wb)
  • 贴吧(tieba)
  • 知乎(zhihu)

功能特性

  • 自动安装依赖
  • 关键词搜索采集
  • 指定帖子/内容 ID 采集
  • 创作者主页采集
  • 评论/二级评论抓取
  • 登录态缓存
  • WebUI 可视化操作
  • 多种数据存储(CSV, JSON, JSONL, Excel, SQLite, MySQL, MongoDB)
  • 结果文件快速定位

Usage

安装环境

bash scripts/setup.sh

查看帮助

cd "$PROJECT_PATH"
uv run main.py --help

运行采集

小红书 - 关键词搜索

uv run main.py --platform xhs --lt qrcode --type search --keywords "护肤" --headless false

抖音 - 关键词搜索

uv run main.py --platform dy --lt qrcode --type search --keywords "护肤" --headless false

指定帖子详情抓取

uv run main.py --platform xhs --lt qrcode --type detail --specified_id "帖子ID1,帖子ID2"

创作者主页抓取

uv run main.py --platform xhs --lt qrcode --type creator --creator_id "创作者ID1"

启动 WebUI

uv run uvicorn api.main:app --port 8080 --reload

启动后访问:

http://127.0.0.1:8080

数据存储

根据 config/base_config.py 中:

SAVE_DATA_OPTION = "jsonl"
SAVE_DATA_PATH = ""

默认结果保存到:

data/{平台}/{存储格式}/

例如抖音 JSONL:

data/douyin/jsonl/search_contents_YYYY-MM-DD.jsonl
data/douyin/jsonl/search_comments_YYYY-MM-DD.jsonl
data/douyin/jsonl/search_creators_YYYY-MM-DD.jsonl

例如小红书 JSONL:

data/xiaohongshu/jsonl/search_contents_YYYY-MM-DD.jsonl
data/xiaohongshu/jsonl/search_comments_YYYY-MM-DD.jsonl

如果你设置了:

--save_data_path "/your/custom/path"

则结果会写入你指定的目录。

快速查看结果

bash scripts/show_results.sh

该脚本会列出当前项目下 data/ 目录中的结果文件。

前置依赖

  • Git
  • uv(脚本可自动安装)
  • Playwright 浏览器驱动(脚本自动安装 Chromium)

注意事项

  • 仅供学习和研究使用
  • 禁止用于非法用途或侵犯他人合法权益
  • 禁止用于商业化违规爬取
  • 执行前应确认目标行为合法合规

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Cclaw

Open-source comedy AI + video editing + poster generation. Create standup/sketch/manzai/scripts, edit videos via FFmpeg, and generate comedy posters via canv...

Registry SourceRecently Updated
General

Bird Recognition Tool | 鸟类识别工具

Identifies bird species in images/videos of target areas. Supports recognition of no less than 500 common bird species, supports customized model training, s...

Registry SourceRecently Updated
General

Image Amazon Product Image Suite

A professional product image generation skill purpose-built for the Amazon e-commerce platform. Outputs comply with Amazon's image guidelines while optimizin...

Registry SourceRecently Updated
General

SearchOnlineAssets

Online asset search tool: queries public stock libraries (Pixabay) for high-quality photos, illustrations, vectors and videos, returning result metadata and...

Registry SourceRecently Updated