Scrapling 网页抓取 Skill

步骤 0：检查版本

pip show scrapling

未安装 → 执行 pip install "scrapling[fetchers]" + scrapling install
有新版 → 执行 pip install --upgrade "scrapling[fetchers]" → 查 changelog 告知用户
已最新 → 继续

步骤 1：选择 Fetcher

目标网站 →
│
├─ 已有 HTML 字符串/文件，只需解析?
│   → Selector（纯解析，无网络请求）
│   → 模板: parse_only.py
│
├─ 静态页面，无 JS 渲染，无反爬?
│   → Fetcher（最快，基于 curl_cffi）
│   → 模板: basic_fetch.py
│
├─ 需要登录（HTTP 表单，非 JS 登录）?
│   → FetcherSession（保持会话 cookie）
│   → 模板: session_login.py
│
├─ 有 Cloudflare / WAF 保护?
│   → StealthyFetcher（Camoufox 浏览器，自动过 CF）
│   → 模板: stealth_cloudflare.py
│
├─ SPA 应用（React/Vue），需要 JS 渲染?
│   → DynamicFetcher（Playwright 浏览器）
│   → 基于模板即时生成
│
└─ 不确定?
    → 先用 Fetcher 试，403/空内容 → 升级到 StealthyFetcher

步骤 2：执行工作流

1. 检查版本（步骤 0）
2. 查阅 references/site-patterns.md — 匹配已有模式则直接复用
3. 无匹配 → 用决策树选择 Fetcher
4. 读取对应模板 → 替换参数 → 生成完整脚本
5. 执行脚本 → 返回结果
6. **沉淀经验（必做）**:
   - 新站点 → 追加到 site-patterns.md
   - 新 cookie / 用户提供了 cookie → 保存到 cookie-vault.md
   - **完成抓取后必须检查**：是否有新的 cookie 或 site pattern 需要保存

Cookie 格式速查

Fetcher 类型	Cookie 格式	示例
Fetcher / FetcherSession	`dict`	`{'name': 'value', 'token': 'abc'}`
StealthyFetcher / DynamicFetcher	`list[dict]`	`[{'name': 'n', 'value': 'v', 'domain': '.site.com', 'path': '/'}]`

浏览器 Fetcher cookie 必填字段: name, value, domain, path

超时单位速查

Fetcher 类型	超时单位	示例
Fetcher / FetcherSession	秒	`timeout=30`
StealthyFetcher / DynamicFetcher	毫秒	`timeout=60000`

模板索引

模板	文件	何时读取
基础 HTTP 抓取	`templates/basic_fetch.py`	目标为静态页面，无反爬
Cloudflare 绕过	`templates/stealth_cloudflare.py`	目标有 CF/WAF 保护
Session 登录	`templates/session_login.py`	需 HTTP 表单登录后抓取
纯 HTML 解析	`templates/parse_only.py`	已有 HTML 字符串，只需提取数据

References 索引

文件	何时读取
`references/site-patterns.md`	每次抓取前先查阅 — 检查目标站点是否有已记录的模式
`references/api-quick-ref.md`	生成脚本时查阅 — Fetcher/Selector 方法签名和参数
`references/troubleshooting.md`	执行报错时查阅 — 按错误信息查找原因和解决方案
`references/cookie-vault.md`	需要登录 cookie 时查阅 — 检查是否有历史记录可复用
`references/maintenance.md`	安装/升级/依赖问题时查阅 — 安装层级和验证命令

scrapling

Safety Notice

Copy this and send it to your AI assistant to learn

Scrapling 网页抓取 Skill

步骤 0：检查版本

步骤 1：选择 Fetcher

步骤 2：执行工作流

Cookie 格式速查

超时单位速查

模板索引

References 索引

Source Transparency

Related Skills

scrapling

scrapling

scrapling-official

scrapling