cloudflare-crawl

Crawl websites using Cloudflare's Browser Rendering API. Use when you need to scrape entire sites, build knowledge bases, extract content from multiple pages, or create RAG datasets. Works on Cloudflare-protected sites. Returns HTML, Markdown, or AI-extracted JSON.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cloudflare-crawl" with this command: npx skills add wirelessjoe/cloudflare-crawl-skill

Cloudflare Crawl

Crawl entire websites using Cloudflare's Browser Rendering /crawl API. Async job-based crawling with JS rendering.

When to Use

  • Scrape entire sites (not just single pages)
  • Build knowledge bases or RAG datasets
  • Research across multiple pages
  • Sites protected by Cloudflare (CF won't block itself)
  • Need Markdown or structured JSON output

Prerequisites

Get Cloudflare API credentials:

  1. Go to https://dash.cloudflare.com/profile/api-tokens
  2. Create token with Account.Browser Rendering permission
  3. Get your Account ID from dashboard URL

Set environment variables:

export CLOUDFLARE_API_TOKEN="your_token"
export CLOUDFLARE_ACCOUNT_ID="your_account_id"

Quick Start

# Start a crawl job
node scripts/crawl.js start https://example.com --limit 50

# Check status
node scripts/crawl.js status <job_id>

# Get results as markdown
node scripts/crawl.js results <job_id> --format markdown

API Overview

1. Start Crawl Job

curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/browser-rendering/crawl" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 50,
    "depth": 3,
    "formats": ["markdown"]
  }'

Returns: { "success": true, "result": "job_id_here" }

2. Poll for Completion

curl "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/browser-rendering/crawl/$JOB_ID?limit=1" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"

Status values: running, completed, errored, cancelled_due_to_timeout

3. Get Results

curl "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/browser-rendering/crawl/$JOB_ID" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"

Parameters

ParameterTypeDefaultDescription
urlstringrequiredStarting URL
limitnumber10Max pages to crawl (max 100,000)
depthnumber100000Max link depth from start URL
sourcestring"all"URL discovery: all, sitemaps, links
formatsarray["html"]Output: html, markdown, json
renderbooleantrueExecute JS (false = fast HTML only)

Output Formats

Markdown (best for AI)

{
  "url": "https://example.com/page",
  "status": "completed",
  "markdown": "# Page Title\n\nContent here...",
  "metadata": { "title": "Page Title", "status": 200 }
}

JSON (AI-extracted)

Uses Workers AI to extract structured data. Requires jsonOptions.prompt:

{
  "formats": ["json"],
  "jsonOptions": {
    "prompt": "Extract product name, price, and description"
  }
}

Pricing

PlanFree TierOverage
Workers Free10 min/dayN/A
Workers Paid10 hrs/month$0.09/hour

Limits

  • Max 100,000 pages per crawl
  • 7 day max runtime
  • Results available 14 days
  • Free plan: 10 concurrent, 100 pages max

Example: Crawl for RAG

// Crawl docs site for knowledge base
const job = await startCrawl({
  url: 'https://docs.example.com',
  limit: 500,
  formats: ['markdown'],
  source: 'sitemaps' // Use sitemap for efficiency
});

// Wait for completion
const results = await waitForCrawl(job.id);

// Save markdown files for RAG
for (const page of results.records) {
  if (page.status === 'completed') {
    fs.writeFileSync(`docs/${slugify(page.url)}.md`, page.markdown);
  }
}

vs Browserbase/Stagehand

Use CaseCloudflare CrawlBrowserbase
Full site scrape✅ Best❌ Manual
Interactive (forms)❌ No✅ Best
CF-protected sites✅ Native⚠️ Cloud bypass
AI extraction✅ Built-in✅ Stagehand
Session management✅ Async jobs❌ Manual
Cost$0.09/hrCredits

Use Cloudflare Crawl for bulk content extraction. Use Browserbase for interactive automation.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

3dgs Experiment Planner

Design rigorous experiments for 3D Gaussian Splatting research papers. Recommends datasets, baselines, metrics, ablation matrices, and visualization plans ta...

Registry SourceRecently Updated
Research

LinkFoxAgent

Cross-border e-commerce AI Agent with 67 specialized tools for Amazon/TikTok/eBay/Walmart/Shopee product research, competitor analysis, keyword tracking, rev...

Registry SourceRecently Updated
Research

3dgs Paper Reader

Read and summarize 3D Gaussian Splatting research papers. Extracts method architecture, core innovations, experimental results, and key findings from arXiv p...

Registry SourceRecently Updated
Research

Mental Health Analysis Tool | 心理健康分析工具

Analyzes human mental health and psychological behavior, supports identifying common psychological problem tendencies through video analysis, and provides st...

Registry SourceRecently Updated