scrapling

Use Scrapling to scrape websites with adaptive parsing, Cloudflare bypass, and MCP support. Handles dynamic content, anti-bot detection, and provides clean HTML/JSON output.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "scrapling" with this command: npx skills add nanpaidashi/scrapling-ai

Scrapling Skill

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Postgres Query Optimizer

Analyze slow PostgreSQL queries, interpret EXPLAIN ANALYZE output, identify performance bottlenecks, and recommend indexes, query rewrites, and configuration...

Registry SourceRecently Updated
Automation

Agent Memory System v8

Agent 记忆系统 — 6维坐标编码 + RRF双路检索 + sqlite-vec统一存储 + 写入时因果检测 + 多Agent共享 + 记忆蒸馏 + 时间旅行 + 情感编码 + 元认知 + 内在动机 + 叙事自我 + 数字孪生 + 角色模板

Registry SourceRecently Updated
1940Profile unavailable
Automation

Capacity Planner

Forecast infrastructure capacity needs using historical metrics, growth projections, and cost modeling. Identify bottlenecks before they cause outages and ri...

Registry SourceRecently Updated
470Profile unavailable
Automation

Mailchimp

Mailchimp Marketing API integration with managed OAuth. Access audiences, campaigns, templates, automations, reports, and manage subscribers. Use this skill...

Registry SourceRecently Updated
15.7K10Profile unavailable