cf-crawl

Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and AI-powered structured data extraction. Use when crawling entire sites or multiple pages, building knowledge bases, extracting structured data from websites, or when web_fetch is insufficient (JS rendering, multi-page, authenticated crawls).

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cf-crawl" with this command: npx skills add bill492/cf-crawl

Cloudflare /crawl

Async site crawler via CF Browser Rendering API. Start a job → poll for results → get markdown/HTML/JSON per page.

Quick Start

# Crawl a site (5 pages, markdown, no JS rendering = fast + free)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 5 --format markdown

# With JS rendering (for SPAs, dynamic content)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --render --limit 10

# Start only (get job ID, poll later)
bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com" --limit 100 --start-only

# Poll existing job
bash ~/clawd/skills/cf-crawl/scripts/poll.sh <job-id>

Credentials

Stored at ~/.clawdbot/secrets/cloudflare-crawl.env:

CF_ACCOUNT_ID=<account_id>
CF_CRAWL_API_TOKEN=<token_with_read_and_edit>

Key Options

OptionDescription
--limit NMax pages (default 10)
--depth NMax link depth (default 10)
--format markdown|html|jsonOutput format (default markdown)
--renderEnable headless browser (default: off = fast fetch, free during beta)
--include PATWildcard URL pattern to include (repeatable)
--exclude PATWildcard URL pattern to exclude (repeatable)
--externalFollow external domain links
--subdomainsFollow subdomain links
--source all|sitemaps|linksURL discovery method
--json-prompt "..."AI extraction prompt (with --format json)
--json-schema file.jsonJSON schema for structured extraction
--timeout SECMax poll wait (default 300s)
--output FILEWrite full results to file
--rawOutput raw API response
--start-onlyPrint job ID without polling

Common Patterns

Crawl docs site for knowledge base

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://docs.example.com/" \
  --limit 50 --depth 3 --format markdown --output docs.json

Crawl with URL filtering

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/" \
  --include "/docs/**" --exclude "/docs/archive/**" --limit 20

AI-powered structured extraction

bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://example.com/products" \
  --format json --render \
  --json-prompt "Extract product name, price, and description" \
  --json-schema schema.json

Long-running crawl (background)

JOB_ID=$(bash ~/clawd/skills/cf-crawl/scripts/crawl.sh "https://big-site.com" \
  --limit 1000 --start-only)
# Check later:
bash ~/clawd/skills/cf-crawl/scripts/poll.sh "$JOB_ID"

Cost Notes

  • render: false (default) — fast HTML fetch, free during beta
  • render: true — uses Browser Rendering minutes (paid)
  • format json — uses Workers AI tokens for extraction (paid)
  • Results cached in R2 with --max-age (default 24hr)

API Details

See references/api-reference.md for full parameter documentation, response schema, and lifecycle details.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Gigo Lobster Resume

🦞 GIGO · gigo-lobster-resume: 续跑入口:v2 stable 当前会清理旧 checkpoint 并从头重跑;保留此 slug 作为旧 checkpoint 兼容入口。 Triggers: 继续试吃 / 恢复评测 / resume tasting / continue lobster...

Registry SourceRecently Updated
General

YiHui CONTEXT MODE

context-mode is an MCP server that saves 98% of your context window by sandboxing tool outputs. It routes large file reads, shell outputs, and web fetches th...

Registry SourceRecently Updated
General

xinyi-drink

Use when users ask about 新一好喝/新一咖啡 drinks, stores, menu, activities, Skill用户大礼包, today drink recommendations, afternoon tea, feeling sleepy, or personalized...

Registry SourceRecently Updated
General

vedic-destiny

吠陀命盘分析中文入口。用于完整命盘研判、命主盘 Rashi chart 与九分盘 Navamsha chart 联读、既往事件回看、出生时间稳定度判断、事业主题、婚姻主题、时空盘专题,以及基于 Jagannatha Hora PDF、星盘截图或文本命盘数据的系统拆盘。当用户提到完整星盘、事业方向、婚姻问题、关系窗...

Registry SourceRecently Updated