OpenClaw Ultra Scraping
Powered by MyClaw.ai — the AI personal assistant platform that gives every user a full server with complete code control. Part of the MyClaw.ai open skills ecosystem.
Handles everything from single-page extraction to full-scale concurrent crawls with anti-bot bypass.
Setup
Run once before first use:
bash scripts/setup.sh
This installs Scrapling + all browser dependencies into /opt/scrapling-venv.
Quick Start — CLI Script
The bundled scripts/scrape.py provides a unified CLI:
PYTHON=/opt/scrapling-venv/bin/python3
# Simple fetch (JSON output)
$PYTHON scripts/scrape.py fetch "https://example.com" --css ".content"
# Extract text
$PYTHON scripts/scrape.py extract "https://example.com" --css "h1"
# Stealth mode (bypass Cloudflare)
$PYTHON scripts/scrape.py fetch "https://protected-site.com" --stealth --solve-cloudflare --css ".data"
# Dynamic (full browser rendering)
$PYTHON scripts/scrape.py fetch "https://spa-site.com" --dynamic --css ".product"
# Extract links
$PYTHON scripts/scrape.py links "https://example.com" --filter "\.pdf$"
# Multi-page crawl
$PYTHON scripts/scrape.py crawl "https://example.com" --depth 2 --concurrency 10 --css ".item" -o results.json
# Output formats: json, jsonl, csv, text, markdown, html
$PYTHON scripts/scrape.py fetch "https://example.com" -f markdown -o page.md
Quick Start — Python
For complex tasks, write Python directly using the venv:
#!/opt/scrapling-venv/bin/python3
from scrapling.fetchers import Fetcher, StealthyFetcher
# Simple HTTP
page = Fetcher.get('https://example.com', impersonate='chrome')
titles = page.css('h1::text').getall()
# Bypass Cloudflare
page = StealthyFetcher.fetch('https://protected.com', headless=True, solve_cloudflare=True)
data = page.css('.product').getall()
Fetcher Selection Guide
| Scenario | Fetcher | Flag |
|---|---|---|
| Normal sites, fast scraping | Fetcher | (default) |
| JS-rendered SPAs | DynamicFetcher | --dynamic |
| Cloudflare/anti-bot protected | StealthyFetcher | --stealth |
| Cloudflare Turnstile challenge | StealthyFetcher | --stealth --solve-cloudflare |
Selector Cheat Sheet
page.css('.class') # CSS
page.css('.class::text').getall() # Text extraction
page.xpath('//div[@id="main"]') # XPath
page.find_all('div', class_='item') # BS4-style
page.find_by_text('keyword') # Text search
page.css('.item', adaptive=True) # Adaptive (survives redesigns)
Advanced Features
- Adaptive tracking:
auto_save=Trueon first run,adaptive=Truelater — elements are found even after site redesign - Proxy rotation: Pass
proxy="http://host:port"or useProxyRotator - Sessions:
FetcherSession,StealthySession,DynamicSessionfor cookie/state persistence - Spider framework: Scrapy-like concurrent crawling with pause/resume
- Async support: All fetchers have async variants
For full API details: read references/api-reference.md