Scrapling Web Scraping
Zero-bot-detection web scraping for OpenClaw. Bypass Cloudflare, handle JavaScript-heavy sites, and adapt to website changes automatically.
Quick Start
# Install Scrapling
pip install "scrapling[all]"
scrapling install
# Basic usage
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com
# Bypass Cloudflare
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://protected-site.com --mode stealth --cloudflare
# Extract specific data
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com --selector ".product-title"
# JavaScript-heavy sites
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://spa-app.com --mode dynamic --wait ".content-loaded"
Usage with OpenClaw
Natural Language Commands
Basic scraping:
"用Scrapling抓取 https://example.com 的标题和所有链接"
Bypass protection:
"用隐身模式抓取 https://protected-site.com,绕过Cloudflare"
Extract data:
"抓取 https://shop.com 的商品名称和价格,CSS选择器是 .product"
Dynamic content:
"抓取 https://spa-app.com,等待 .data-loaded 元素加载完成"
Python Code
# Basic scraping
from scrapling.fetchers import Fetcher
page = Fetcher.get('https://example.com')
title = page.css('title::text').get()
# Bypass Cloudflare
from scrapling.fetchers import StealthyFetcher
page = StealthyFetcher.fetch('https://protected.com',
headless=True,
solve_cloudflare=True)
# JavaScript sites
from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.fetch('https://spa-app.com',
headless=True,
network_idle=True)
Features
| Feature | Command | Description |
|---|---|---|
| Basic Scrape | --mode basic | Fast HTTP requests |
| Stealth Mode | --mode stealth | Bypass Cloudflare/anti-bot |
| Dynamic Mode | --mode dynamic | Handle JavaScript sites |
| CSS Selectors | --selector ".class" | Extract specific elements |
| JSON Output | --json | Machine-readable output |
Examples
1. Scrape with CSS Selector
python3 scrapling_tool.py https://quotes.toscrape.com --selector ".quote .text" --json
2. Bypass Cloudflare
python3 scrapling_tool.py https://nopecha.com/demo/cloudflare --mode stealth --cloudflare
3. Wait for Dynamic Content
python3 scrapling_tool.py https://spa-app.com --mode dynamic --wait ".loaded" --json
CLI Reference
python3 scrapling_tool.py URL [options]
Options:
--mode {basic,stealth,dynamic} Scraping mode (default: basic)
--selector, -s CSS_SELECTOR Extract specific elements
--cloudflare Solve Cloudflare (stealth mode only)
--wait SELECTOR Wait for element (dynamic mode only)
--json, -j Output as JSON
Advanced: Custom Scripts
Create custom scraping scripts in /root/.openclaw/skills/scrapling-web-scraping/:
from scrapling.fetchers import StealthyFetcher
# Your custom scraper
def scrape_products(url):
page = StealthyFetcher.fetch(url, headless=True)
products = []
for item in page.css('.product'):
products.append({
'name': item.css('.name::text').get(),
'price': item.css('.price::text').get(),
'link': item.css('a::attr(href)').get()
})
return products
Notes
- Requires Python 3.10+
- First run:
scrapling installto download browsers - Respect website Terms of Service
- Use responsibly
Created: 2026-03-05 by 老二 Source: https://github.com/D4Vinci/Scrapling