crawl-url

Crawls websites using Tavily Crawl API and saves each page as a separate markdown file in a flat directory structure.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawl-url" with this command: npx skills add tavily-ai/tavily-plugins/tavily-ai-tavily-plugins-crawl-url

URL Crawler

Crawls websites using Tavily Crawl API and saves each page as a separate markdown file in a flat directory structure.

Prerequisites

Tavily API Key Required - Get your key at https://tavily.com

Add to ~/.claude/settings.json :

{ "env": { "TAVILY_API_KEY": "tvly-your-api-key-here" } }

Restart Claude Code after adding your API key.

When to Use

Use this skill when the user wants to:

  • Crawl and extract content from a website

  • Download API documentation, framework docs, or knowledge bases

  • Save web content locally for offline access or analysis

Usage

Execute the crawl script with a URL and optional instruction:

python scripts/crawl_url.py <URL> [--instruction "guidance text"]

Required Parameters

Optional Parameters

  • --instruction, -i : Natural language guidance for the crawler (e.g., "Focus on API endpoints only")

  • --output, -o : Output directory (default: <repo_root>/crawled_context/<domain> )

  • --depth, -d : Max crawl depth (default: 2, range: 1-5)

  • --breadth, -b : Max links per level (default: 50)

  • --limit, -l : Max total pages to crawl (default: 50)

Output

The script creates a flat directory structure at <repo_root>/crawled_context/<domain>/ with one markdown file per crawled page. Filenames are derived from URLs (e.g., docs_stripe_com_api_authentication.md ).

Each markdown file includes:

  • Frontmatter with source URL and crawl timestamp

  • The extracted content in markdown format

Examples

Basic Crawl

python scripts/crawl_url.py https://docs.anthropic.com

Crawls the Anthropic docs with default settings, saves to <repo_root>/crawled_context/docs_anthropic_com/ .

With Instruction

python scripts/crawl_url.py https://react.dev --instruction "Focus on API reference pages and hooks documentation"

Uses natural language instruction to guide the crawler toward specific content.

Custom Output Directory

python scripts/crawl_url.py https://docs.stripe.com/api -o ./stripe-api-docs

Saves results to a custom directory.

Adjust Crawl Parameters

python scripts/crawl_url.py https://nextjs.org/docs --depth 3 --breadth 100 --limit 200

Increases crawl depth, breadth, and page limit for more comprehensive coverage.

Important Notes

  • API Key Required: Set TAVILY_API_KEY environment variable (loads from .env if available)

  • Crawl Time: Deeper crawls take longer (depth 3+ may take many minutes)

  • Filename Safety: URLs are converted to safe filenames automatically

  • Flat Structure: All files saved in <repo_root>/crawled_context/<domain>/ directory regardless of original URL hierarchy

  • Duplicate Prevention: Files are overwritten if URLs generate identical filenames

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

search

Search the web and get relevant results optimized for LLM consumption.

Repository SourceNeeds Review
11.7K86tavily-ai
General

extract

No summary provided by upstream source.

Repository SourceNeeds Review
General

crawl

No summary provided by upstream source.

Repository SourceNeeds Review
General

tavily-best-practices

No summary provided by upstream source.

Repository SourceNeeds Review