crawl4ai

Use when crawling web pages, extracting markdown content, or scraping website data with intelligent chunking and skeleton planning. Use when the user provides a URL or link to fetch or crawl.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawl4ai" with this command: npx skills add tao3k/omni-dev-fusion/tao3k-omni-dev-fusion-crawl4ai

crawl4ai

High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.

Commands

crawl_url (alias: webCrawl)

Crawl a web page with native workflow execution and LLM-based intelligent chunking.

Parameters:

ParameterTypeDefaultDescription
urlstr-Target URL to crawl (required)
actionstr"smart"Action mode: "smart", "skeleton", "crawl"
fit_markdownbooltrueClean and simplify markdown output
max_depthint0Maximum crawling depth (0=single page)
return_skeletonboolfalseAlso return document skeleton (TOC)
chunk_indiceslist[int]-List of section indices to extract

Action Modes:

ModeDescriptionUse Case
smart (default)LLM generates chunk plan, then extracts relevant sectionsLarge docs where you need specific info
skeletonExtract lightweight TOC without full contentQuick overview, decide what to read
crawlReturn full markdown contentSmall pages, complete content needed

Runtime Transport:

  • max_depth = 0: Uses HTTP strategy (no browser cold-start) for lower latency.
  • max_depth > 0: Uses browser deep-crawl strategy (BFS) for multi-page traversal.
  • file://... with max_depth = 0: Uses local fast-path (no crawl4ai runtime bootstrap) for deterministic fixture/local-note benchmarking.
  • Persistent worker mode reuses the HTTP crawler instance across requests to reduce repeated initialization cost.

Examples:

# Smart crawl with LLM chunking (default)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com"})

# Skeleton only - get TOC quickly
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "skeleton"})

# Full content crawl
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "crawl"})

# Extract specific sections
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "chunk_indices": [0, 1, 2]})

# Deep crawl (follow links up to depth N)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "max_depth": 2})

# Get skeleton with full content
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "return_skeleton": true})

Core Concepts

TopicDescriptionReference
Skeleton PlanningLLM sees TOC (~500 tokens) not full content (~10k+)smart-chunking.md
Chunk ExtractionToken-aware section extractionchunking.md
Deep CrawlingMulti-page crawling with BFS strategydeep-crawl.md

Best Practices

  • Use skeleton mode first for large documents to understand structure
  • Use chunk_indices to extract specific sections instead of full content
  • Set max_depth > 0 carefully - limits pages crawled to prevent runaway crawling
  • Keep fit_markdown=true for cleaner output, false for raw content

Advanced

  • Batch multiple URLs with separate calls
  • Combine with knowledge tools for RAG pipelines
  • Use skeleton + LLM to auto-generate chunk plans for custom extraction

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

python_engineering

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

memory

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

code_tools

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

rust_engineering

No summary provided by upstream source.

Repository SourceNeeds Review