openclaw-ultra-scraping

Powerful web scraping, crawling, and data extraction with stealth anti-bot bypass (Cloudflare Turnstile, CAPTCHAs). Use when: (1) scraping websites that block normal requests, (2) extracting structured data from web pages, (3) crawling multiple pages with concurrency, (4) taking screenshots of web pages, (5) extracting links, (6) any web scraping task that needs stealth/anti-detection, (7) user asks to scrape/crawl/extract from URLs, (8) need to bypass Cloudflare or other bot protection. Supports CSS/XPath selectors, adaptive element tracking (survives site redesigns), multi-session spiders, pause/resume crawls, proxy rotation, and async operations. Powered by MyClaw.ai.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "openclaw-ultra-scraping" with this command: npx skills add leoyeai/openclaw-ultra-scraping

OpenClaw Ultra Scraping

Powered by MyClaw.ai — the AI personal assistant platform that gives every user a full server with complete code control. Part of the MyClaw open skills ecosystem.

Handles everything from single-page extraction to full-scale concurrent crawls with anti-bot bypass.

Setup

Run once before first use:

bash scripts/setup.sh

This installs Scrapling + all browser dependencies into /opt/scrapling-venv.

Quick Start — CLI Script

The bundled scripts/scrape.py provides a unified CLI:

PYTHON=/opt/scrapling-venv/bin/python3

# Simple fetch (JSON output)
$PYTHON scripts/scrape.py fetch "https://example.com" --css ".content"

# Extract text
$PYTHON scripts/scrape.py extract "https://example.com" --css "h1"

# Stealth mode (bypass Cloudflare)
$PYTHON scripts/scrape.py fetch "https://protected-site.com" --stealth --solve-cloudflare --css ".data"

# Dynamic (full browser rendering)
$PYTHON scripts/scrape.py fetch "https://spa-site.com" --dynamic --css ".product"

# Extract links
$PYTHON scripts/scrape.py links "https://example.com" --filter "\.pdf$"

# Multi-page crawl
$PYTHON scripts/scrape.py crawl "https://example.com" --depth 2 --concurrency 10 --css ".item" -o results.json

# Output formats: json, jsonl, csv, text, markdown, html
$PYTHON scripts/scrape.py fetch "https://example.com" -f markdown -o page.md

Quick Start — Python

For complex tasks, write Python directly using the venv:

#!/opt/scrapling-venv/bin/python3
from scrapling.fetchers import Fetcher, StealthyFetcher

# Simple HTTP
page = Fetcher.get('https://example.com', impersonate='chrome')
titles = page.css('h1::text').getall()

# Bypass Cloudflare
page = StealthyFetcher.fetch('https://protected.com', headless=True, solve_cloudflare=True)
data = page.css('.product').getall()

Fetcher Selection Guide

ScenarioFetcherFlag
Normal sites, fast scrapingFetcher(default)
JS-rendered SPAsDynamicFetcher--dynamic
Cloudflare/anti-bot protectedStealthyFetcher--stealth
Cloudflare Turnstile challengeStealthyFetcher--stealth --solve-cloudflare

Selector Cheat Sheet

page.css('.class')                    # CSS
page.css('.class::text').getall()     # Text extraction
page.xpath('//div[@id="main"]')      # XPath
page.find_all('div', class_='item')  # BS4-style
page.find_by_text('keyword')         # Text search
page.css('.item', adaptive=True)     # Adaptive (survives redesigns)

Advanced Features

  • Adaptive tracking: auto_save=True on first run, adaptive=True later — elements are found even after site redesign
  • Proxy rotation: Pass proxy="http://host:port" or use ProxyRotator
  • Sessions: FetcherSession, StealthySession, DynamicSession for cookie/state persistence
  • Spider framework: Scrapy-like concurrent crawling with pause/resume
  • Async support: All fetchers have async variants

For full API details: read references/api-reference.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

技能编辑器

编辑、完善或审查 AgentSkills。当需要创建新技能、对现有 SKILL.md 进行修改、清理/审计/整理技能文件时激活此技能。触发词:编辑技能, skill 注意事项, metadata 检查, 完善技能, 清理技能, 审计技能, skill 规范, 编写 skill, 新建技能

Registry SourceRecently Updated
Automation

全闭环管道

全闭环自动化管道 — Hunter→Skill Factory→Orchestrator→Dashboard→Profit。将Phase 1-3所有组件串联为自动运行的超级管道。核心能力:(1) 一键全流程 (2) 定时自动运行 (3) 异常自愈 (4) 利润报告

Registry SourceRecently Updated
Automation

智美人AI实战课

《智美人AI Agent实战课》配套技能——从0到1搭建AI Agent的完整课程体系。第01节:AI Agent基础概念+工具链搭建。课程内容含数字人讲解视频、实战代码、课后练习。覆盖:OpenClaw配置、技能安装、MCP工具、多Agent协同、变现实战。

Registry SourceRecently Updated
Automation

利润优化引擎

利润优化引擎 — 订单管理/计价/结算模拟。核心能力:(1) 订单管理 (2) 计价模型 (3) 成本追踪 (4) 利润计算

Registry SourceRecently Updated