crawl4ai-skill

Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scraping. 智能搜索爬取 | Free, no API key required.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawl4ai-skill" with this command: npx skills add lancelin111/crawl4ai-skill/lancelin111-crawl4ai-skill-crawl4ai-skill

Crawl4AI Skill - Web Crawler & Scraper

Web Crawling 网页爬虫 | Web Scraping 网页爬取 | LLM 优化输出

智能网页爬虫和爬取工具,支持搜索、全站爬取、动态页面抓取。Free web crawler and scraper with LLM-optimized Markdown output.

核心功能 | Core Features

  • 🔍 Web Search 网页搜索 - DuckDuckGo search, 免 API key
  • 🕷️ Web Crawling 网页爬虫 - Site crawler, spider, sitemap 识别
  • 📝 Web Scraping 网页抓取 - Smart scraper, data extraction
  • 📄 LLM-Optimized Output - Fit Markdown, 省 Token 80%
  • Dynamic Page Scraping - JavaScript 渲染页面爬取

快速开始 | Quick Start

安装 | Installation

pip install crawl4ai-skill

Web Search | 网页搜索

# Search the web with DuckDuckGo
crawl4ai-skill search "python web scraping"

Web Scraping | 单页爬取

# Scrape a single web page
crawl4ai-skill crawl https://example.com

Web Crawling | 全站爬虫

# Crawl entire website / spider
crawl4ai-skill crawl-site https://docs.python.org --max-pages 50

使用场景 | Use Cases

场景 1:Web Crawler for Documentation | 文档站爬虫

# Crawl documentation site with spider
crawl4ai-skill crawl-site https://docs.fastapi.com --max-pages 100

爬虫效果 | Crawler Output:

  • ❌ 移除:导航栏、侧边栏、广告
  • ✅ 保留:标题、正文、代码块
  • 📊 Token:50,000 → 10,000(-80%)

场景 2:Search + Scrape | 搜索+爬取

# Search and scrape top results
crawl4ai-skill search-and-crawl "Vue 3 best practices" --crawl-top 3

场景 3:Dynamic Page Scraping | 动态页面抓取

JavaScript 渲染的页面爬取(雪球、知乎等):

# Scrape JavaScript-heavy pages
crawl4ai-skill crawl https://xueqiu.com/S/BIDU --wait-until networkidle --delay 2

命令参考 | Commands

命令 Command说明 Description
search <query>Web search 网页搜索
crawl <url>Web scraping 单页爬取
crawl-site <url>Web crawling 全站爬虫
search-and-crawl <query>Search + scrape 搜索并爬取

常用参数 | Common Options

# Web Search 搜索
--num-results 10          # Number of results

# Web Scraping 爬取
--format fit_markdown     # Output format
--output result.md        # Output file
--wait-until networkidle  # Wait strategy for dynamic pages
--delay 2                 # Additional wait time (seconds)
--wait-for ".selector"    # Wait for specific element

# Web Crawling 爬虫
--max-pages 100          # Max pages to crawl
--max-depth 3            # Max crawl depth

输出格式 | Output Formats

fit_markdown(推荐 Recommended)

智能提取,节省 80% Token。Smart extraction, save 80% tokens.

crawl4ai-skill crawl https://example.com --format fit_markdown

raw_markdown

保留完整结构。Preserve full structure.

crawl4ai-skill crawl https://example.com --format raw_markdown

为什么选择这个爬虫?| Why This Crawler?

免费爬虫 Free Crawler - 无需 API key,开箱即用
智能爬取 Smart Scraper - 自动去噪,提取核心内容
全站爬虫 Site Crawler - 支持 sitemap,递归爬取
动态爬取 Dynamic Scraping - JavaScript 渲染页面支持
搜索集成 Search Integration - DuckDuckGo 搜索内置


链接 | Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

XCrawl Scraper

XCrawl - AI-Powered Web Scraping API / AI 驱动网页爬虫,支持结构化数据提取

Registry SourceRecently Updated
4210Profile unavailable
General

Ddg Search Fetch

Search the web and fetch URL content using DuckDuckGo. Use when the user wants to search for information online without requiring API keys or paid services....

Registry SourceRecently Updated
2500Profile unavailable
General

Scrapeless LLM Chat Scraper Skill

Scrape AI chat conversations from ChatGPT, Gemini, Perplexity, Copilot, Google AI Mode, and Grok.

Registry SourceRecently Updated
2190Profile unavailable
General

小红书商业洞察与竞品分析助手

小红书运营全链路数据工具|支持关键词搜索/笔记详情查询/爆款挖掘/竞品分析/KOL筛选/趋势洞察,基于Node.js开发,可获取小红书图文/视频笔记的点赞/评论/收藏数据,用数据驱动小红书流量增长,告别盲目创作

Registry SourceRecently Updated
2140Profile unavailable