AnyCrawl-API

# AnyCrawl Skill

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "AnyCrawl-API" with this command: npx skills add techlaai/anycrawl

AnyCrawl Skill

AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.

Setup

Method 1: Environment variable (Recommended)

export ANYCRAWL_API_KEY="your-api-key"

Make it permanent by adding to ~/.bashrc or ~/.zshrc:

echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc

Get your API key at: https://anycrawl.dev

Method 2: OpenClaw gateway config

openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"

Functions

1. anycrawl_scrape

Scrape a single URL and convert to LLM-ready structured data.

Parameters:

  • url (string, required): URL to scrape
  • engine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer"
  • formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"]
  • timeout (number, optional): Timeout in milliseconds (default: 30000)
  • wait_for (number, optional): Delay before extraction in ms (browser engines only)
  • wait_for_selector (string/object/array, optional): Wait for CSS selectors
  • include_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"])
  • exclude_tags (array, optional): Exclude these HTML tags
  • proxy (string, optional): Proxy URL (e.g., "http://proxy:port")
  • json_options (object, optional): JSON extraction with schema/prompt
  • extract_source (string, optional): "markdown" (default) or "html"

Examples:

// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })

// Scrape SPA with Playwright
anycrawl_scrape({ 
  url: "https://spa-example.com",
  engine: "playwright",
  formats: ["markdown", "screenshot"]
})

// Extract structured JSON
anycrawl_scrape({
  url: "https://product-page.com",
  engine: "cheerio",
  json_options: {
    schema: {
      type: "object",
      properties: {
        product_name: { type: "string" },
        price: { type: "number" },
        description: { type: "string" }
      },
      required: ["product_name", "price"]
    },
    user_prompt: "Extract product details from this page"
  }
})

2. anycrawl_search

Search Google and return structured results.

Parameters:

  • query (string, required): Search query
  • engine (string, optional): Search engine - "google" (default)
  • limit (number, optional): Max results per page (default: 10)
  • offset (number, optional): Number of results to skip (default: 0)
  • pages (number, optional): Number of pages to retrieve (default: 1, max: 20)
  • lang (string, optional): Language locale (e.g., "en", "zh", "vi")
  • safe_search (number, optional): 0 (off), 1 (medium), 2 (high)
  • scrape_options (object, optional): Scrape each result URL with these options

Examples:

// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })

// Multi-page search in Vietnamese
anycrawl_search({ 
  query: "hướng dẫn Node.js",
  pages: 3,
  lang: "vi"
})

// Search and auto-scrape results
anycrawl_search({
  query: "best AI tools 2026",
  limit: 5,
  scrape_options: {
    engine: "cheerio",
    formats: ["markdown"]
  }
})

3. anycrawl_crawl_start

Start crawling an entire website (async job).

Parameters:

  • url (string, required): Seed URL to start crawling
  • engine (string, optional): "cheerio" (default), "playwright", "puppeteer"
  • strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin"
  • max_depth (number, optional): Max depth from seed URL (default: 10)
  • limit (number, optional): Max pages to crawl (default: 100)
  • include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"])
  • exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"])
  • scrape_paths (array, optional): Only scrape URLs matching these patterns
  • scrape_options (object, optional): Per-page scrape options

Examples:

// Crawl entire website
anycrawl_crawl_start({ 
  url: "https://docs.example.com",
  engine: "cheerio",
  max_depth: 5,
  limit: 50
})

// Crawl only blog posts
anycrawl_crawl_start({
  url: "https://example.com",
  strategy: "same-domain",
  include_paths: ["/blog/*"],
  exclude_paths: ["/blog/tags/*"],
  scrape_options: {
    formats: ["markdown"]
  }
})

// Crawl product pages only
anycrawl_crawl_start({
  url: "https://shop.example.com",
  strategy: "same-domain",
  scrape_paths: ["/products/*"],
  limit: 200
})

4. anycrawl_crawl_status

Check crawl job status.

Parameters:

  • job_id (string, required): Crawl job ID

Example:

anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })

5. anycrawl_crawl_results

Get crawl results (paginated).

Parameters:

  • job_id (string, required): Crawl job ID
  • skip (number, optional): Number of results to skip (default: 0)

Example:

// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })

// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })

6. anycrawl_crawl_cancel

Cancel a running crawl job.

Parameters:

  • job_id (string, required): Crawl job ID

7. anycrawl_search_and_scrape

Quick helper: Search Google then scrape top results.

Parameters:

  • query (string, required): Search query
  • max_results (number, optional): Max results to scrape (default: 3)
  • scrape_engine (string, optional): Engine for scraping (default: "cheerio")
  • formats (array, optional): Output formats (default: ["markdown"])
  • lang (string, optional): Search language

Example:

anycrawl_search_and_scrape({
  query: "latest AI news",
  max_results: 5,
  formats: ["markdown"]
})

Engine Selection Guide

EngineBest ForSpeedJS Rendering
cheerioStatic HTML, news, blogs⚡ Fastest❌ No
playwrightSPAs, complex web apps🐢 Slower✅ Yes
puppeteerChrome-specific, metrics🐢 Slower✅ Yes

Response Format

All responses follow this structure:

{
  "success": true,
  "data": { ... },
  "message": "Optional message"
}

Error response:

{
  "success": false,
  "error": "Error type",
  "message": "Human-readable message"
}

Common Error Codes

  • 400 - Bad Request (validation errors)
  • 401 - Unauthorized (invalid API key)
  • 402 - Payment Required (insufficient credits)
  • 404 - Not Found
  • 429 - Rate limit exceeded
  • 500 - Internal server error

API Limits

  • Rate limits apply based on your plan
  • Crawl jobs expire after 24 hours
  • Max crawl limit: depends on credits

Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Partycraft

Provides customized event planning for weddings, birthdays, and corporate events, including budgets, checklists, invitations, timelines, and vendor lists.

Registry SourceRecently Updated
General

Logbook

Logbook - command-line tool for everyday use

Registry SourceRecently Updated
General

task-plan-generator_cn

为复杂任务生成多方案任务计划,并在执行过程中持续优化。适用于:1) 用户提出需要多步骤完成的任务;2) 需要协调多个工具或skill的复杂工作流;3) 需要评估不同执行策略的场景。此skill会在每次任务完成后生成简报归档,分析任务复杂度并提供效率/成功率/资源消耗等多种方案供选择。

Registry SourceRecently Updated
General

Vision

Vision - command-line tool for everyday use

Registry SourceRecently Updated