tavily-crawl

Crawl any website and save pages as local markdown files. Ideal for downloading documentation, knowledge bases, or web content for offline access or analysis.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tavily-crawl" with this command: npx skills add matthew77/liang-tavily-crawl

Tavily Crawl

Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.

Authentication

Get your API key at https://tavily.com and add to your OpenClaw config:

{
  "skills": {
    "entries": {
      "tavily-crawl": {
        "enabled": true,
        "apiKey": "tvly-YOUR_API_KEY_HERE"
      }
    }
  }
}

Or set in environment variable:

export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"

Quick Start

Using the Script

node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --output ./docs
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 --limit 50

Examples

# Basic crawl
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"

# Deeper crawl with limits
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --limit 50

# Save to files
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --output ./docs

# Focused crawl with path filters
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 \
  --select "/docs/.*" --exclude "/blog/.*"

# With semantic instructions
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" \
  --instructions "Find API documentation" --chunks 3

Options

OptionDescriptionDefault
--depth <n>Crawl depth (1-5)1
--breadth <n>Links per page20
--limit <n>Total pages cap50
--output <dir>Save pages to directory-
--instructions <text>Natural language guidance-
--chunks <n>Chunks per page (1-5, requires instructions)-
--depth-mode <mode>Extract depth: basic or advancedbasic
--select <pattern>Regex pattern to include-
--exclude <pattern>Regex pattern to exclude-
--timeout <sec>Max wait time (10-150 seconds)150
--jsonOutput raw JSONfalse

Depth vs Performance

DepthTypical PagesTime
110-50Seconds
250-500Minutes
3500-5000Many minutes

Start with --depth 1 and increase only if needed.

Crawl for Context vs Data Collection

For agentic use (feeding results into context): Always use --instructions + --chunks. This returns only relevant chunks instead of full pages, preventing context window explosion.

For data collection (saving to files): Omit --chunks to get full page content.

Tips

  • Always use --chunks for agentic workflows - prevents context explosion when feeding results to LLMs
  • Omit --chunks only for data collection - when saving full pages to files
  • Start conservative (--depth 1, --limit 20) and scale up
  • Use path patterns to focus on relevant sections
  • Always set a --limit to prevent runaway crawls

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

deep-research-surf

Conducts deep, multi-angle research using Surf MCP tools and parallel subagents. Use for deep research, competitive landscape analysis, strategic intelligenc...

Registry SourceRecently Updated
00Profile unavailable
Research

audio-quality-check

Analyze audio recording quality - echo detection, loudness, speech intelligibility, SNR, spectral analysis. Use when the user wants to check a recording's qu...

Registry SourceRecently Updated
1040Profile unavailable
Research

GEO Performance Analysis

Analyzes a brand’s presence and sentiment in LLM-generated industry recommendations, extracting mention context and competitor comparisons.

Registry SourceRecently Updated
00Profile unavailable
Research

Paperspace

Paperspace integration. Manage data, records, and automate workflows. Use when the user wants to interact with Paperspace data.

Registry SourceRecently Updated
1360Profile unavailable