AnyCrawl-API

# AnyCrawl Skill

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "AnyCrawl-API" with this command: npx skills add techlaai/anycrawl

AnyCrawl Skill

AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.

Setup

Method 1: Environment variable (Recommended)

export ANYCRAWL_API_KEY="your-api-key"

Make it permanent by adding to ~/.bashrc or ~/.zshrc:

echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc

Get your API key at: https://anycrawl.dev

Method 2: OpenClaw gateway config

openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"

Functions

1. anycrawl_scrape

Scrape a single URL and convert to LLM-ready structured data.

Parameters:

  • url (string, required): URL to scrape
  • engine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer"
  • formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"]
  • timeout (number, optional): Timeout in milliseconds (default: 30000)
  • wait_for (number, optional): Delay before extraction in ms (browser engines only)
  • wait_for_selector (string/object/array, optional): Wait for CSS selectors
  • include_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"])
  • exclude_tags (array, optional): Exclude these HTML tags
  • proxy (string, optional): Proxy URL (e.g., "http://proxy:port")
  • json_options (object, optional): JSON extraction with schema/prompt
  • extract_source (string, optional): "markdown" (default) or "html"

Examples:

// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })

// Scrape SPA with Playwright
anycrawl_scrape({ 
  url: "https://spa-example.com",
  engine: "playwright",
  formats: ["markdown", "screenshot"]
})

// Extract structured JSON
anycrawl_scrape({
  url: "https://product-page.com",
  engine: "cheerio",
  json_options: {
    schema: {
      type: "object",
      properties: {
        product_name: { type: "string" },
        price: { type: "number" },
        description: { type: "string" }
      },
      required: ["product_name", "price"]
    },
    user_prompt: "Extract product details from this page"
  }
})

2. anycrawl_search

Search Google and return structured results.

Parameters:

  • query (string, required): Search query
  • engine (string, optional): Search engine - "google" (default)
  • limit (number, optional): Max results per page (default: 10)
  • offset (number, optional): Number of results to skip (default: 0)
  • pages (number, optional): Number of pages to retrieve (default: 1, max: 20)
  • lang (string, optional): Language locale (e.g., "en", "zh", "vi")
  • safe_search (number, optional): 0 (off), 1 (medium), 2 (high)
  • scrape_options (object, optional): Scrape each result URL with these options

Examples:

// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })

// Multi-page search in Vietnamese
anycrawl_search({ 
  query: "hướng dẫn Node.js",
  pages: 3,
  lang: "vi"
})

// Search and auto-scrape results
anycrawl_search({
  query: "best AI tools 2026",
  limit: 5,
  scrape_options: {
    engine: "cheerio",
    formats: ["markdown"]
  }
})

3. anycrawl_crawl_start

Start crawling an entire website (async job).

Parameters:

  • url (string, required): Seed URL to start crawling
  • engine (string, optional): "cheerio" (default), "playwright", "puppeteer"
  • strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin"
  • max_depth (number, optional): Max depth from seed URL (default: 10)
  • limit (number, optional): Max pages to crawl (default: 100)
  • include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"])
  • exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"])
  • scrape_paths (array, optional): Only scrape URLs matching these patterns
  • scrape_options (object, optional): Per-page scrape options

Examples:

// Crawl entire website
anycrawl_crawl_start({ 
  url: "https://docs.example.com",
  engine: "cheerio",
  max_depth: 5,
  limit: 50
})

// Crawl only blog posts
anycrawl_crawl_start({
  url: "https://example.com",
  strategy: "same-domain",
  include_paths: ["/blog/*"],
  exclude_paths: ["/blog/tags/*"],
  scrape_options: {
    formats: ["markdown"]
  }
})

// Crawl product pages only
anycrawl_crawl_start({
  url: "https://shop.example.com",
  strategy: "same-domain",
  scrape_paths: ["/products/*"],
  limit: 200
})

4. anycrawl_crawl_status

Check crawl job status.

Parameters:

  • job_id (string, required): Crawl job ID

Example:

anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })

5. anycrawl_crawl_results

Get crawl results (paginated).

Parameters:

  • job_id (string, required): Crawl job ID
  • skip (number, optional): Number of results to skip (default: 0)

Example:

// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })

// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })

6. anycrawl_crawl_cancel

Cancel a running crawl job.

Parameters:

  • job_id (string, required): Crawl job ID

7. anycrawl_search_and_scrape

Quick helper: Search Google then scrape top results.

Parameters:

  • query (string, required): Search query
  • max_results (number, optional): Max results to scrape (default: 3)
  • scrape_engine (string, optional): Engine for scraping (default: "cheerio")
  • formats (array, optional): Output formats (default: ["markdown"])
  • lang (string, optional): Search language

Example:

anycrawl_search_and_scrape({
  query: "latest AI news",
  max_results: 5,
  formats: ["markdown"]
})

Engine Selection Guide

EngineBest ForSpeedJS Rendering
cheerioStatic HTML, news, blogs⚡ Fastest❌ No
playwrightSPAs, complex web apps🐢 Slower✅ Yes
puppeteerChrome-specific, metrics🐢 Slower✅ Yes

Response Format

All responses follow this structure:

{
  "success": true,
  "data": { ... },
  "message": "Optional message"
}

Error response:

{
  "success": false,
  "error": "Error type",
  "message": "Human-readable message"
}

Common Error Codes

  • 400 - Bad Request (validation errors)
  • 401 - Unauthorized (invalid API key)
  • 402 - Payment Required (insufficient credits)
  • 404 - Not Found
  • 429 - Rate limit exceeded
  • 500 - Internal server error

API Limits

  • Rate limits apply based on your plan
  • Crawl jobs expire after 24 hours
  • Max crawl limit: depends on credits

Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Expedy

Expedy integration. Manage Organizations, Pipelines, Users, Filters. Use when the user wants to interact with Expedy data.

Registry SourceRecently Updated
General

Evenium

Evenium integration. Manage Events, Users, Roles. Use when the user wants to interact with Evenium data.

Registry SourceRecently Updated
General

Exhibitday

ExhibitDay integration. Manage Organizations. Use when the user wants to interact with ExhibitDay data.

Registry SourceRecently Updated
General

Enigma

Enigma integration. Manage Deals, Persons, Organizations, Leads, Projects, Activities and more. Use when the user wants to interact with Enigma data.

Registry SourceRecently Updated