XCrawl Scraper

XCrawl - AI-Powered Web Scraping API / AI 驱动网页爬虫,支持结构化数据提取

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "XCrawl Scraper" with this command: npx skills add zhangss110/xcrawl-scraper

<p align="center"> <img src="https://img.shields.io/badge/Python-3.8%2B-blue?style=for-the-badge&logo=python" alt="Python"> <img src="https://img.shields.io/badge License-MIT-yellow?style=for-the-badge" alt="License"> </p> <h1 align="center">🕷️ XCrawl Scraper / XCrawl 网页爬虫</h1> <p align="center"> <strong>AI-Powered Web Scraping API - 结构化数据提取利器</strong><br> <em>支持 Markdown、HTML、JSON、Screenshot 等多种格式输出</em> </p>

✨ 功能特点 / Features

功能说明
🏷️ 网页爬取支持 Markdown、HTML、JSON、Screenshot
🔍 搜索搜索引擎结果爬取
🗺️ 网站地图自动发现站点所有页面
🕷️ 站点爬取批量爬取整个站点
📊 结构化数据JSON Schema 自动提取结构化数据
🌐 代理支持全球代理可选

📦 安装 / Installation

方式一:运行安装脚本

scripts\install.bat

方式二:手动安装

# 1. 安装依赖
pip install xcrawl

# 2. 配置 API Key
python scripts\xcrawl_scraper.py set-key YOUR_API_KEY

⚙️ 获取 API Key

  1. 访问 https://xcrawl.com 注册账号
  2. 获取 API Key
  3. 配置:
python scripts\xcrawl_scraper.py set-key YOUR_API_KEY

📖 使用方法 / Usage

1. 爬取网页 (基本)

python scripts\xcrawl_scraper.py scrape https://example.com markdown

2. 爬取多个格式

python scripts\xcrawl_scraper.py scrape https://example.com markdown html links

3. 结构化数据提取 (JSON)

python scripts\xcrawl_scraper.py scrape https://example.com json "提取产品名称和价格"

4. 搜索

python scripts\xcrawl_scraper.py search "web scraping"

5. 网站地图

python scripts\xcrawl_scraper.py map https://example.com

6. 站点爬取

python scripts\xcrawl_scraper.py crawl https://example.com

📋 命令列表

命令说明
scrape <URL> [formats...]爬取网页
search <query>搜索
map <URL>网站地图
crawl <URL>站点爬取
set-key <API_KEY>设置 API Key
config显示配置

🔧 配置 / Configuration

配置文件: scripts/config.json

{
  "apiKey": "YOUR_API_KEY",
  "apiUrl": "https://run.xcrawl.com",
  "timeout": 60,
  "defaultFormats": ["markdown"],
  "defaultProxy": ""
}

📝 示例输出

Markdown 输出

# Example Domain

This domain is for use in illustrative examples in documents.

JSON 输出

{
  "product_name": "iPhone 15 Pro",
  "price": 999,
  "currency": "USD"
}

📦 依赖

  • Python >= 3.8
  • xcrawl SDK

🤝 贡献 / Contributing

欢迎提交 Issue 和 Pull Request!


📄 许可证

MIT License


<p align="center">🕷️ Powered by XCrawl</p>

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Ffagen Minimax Vision Scraper

Playwright截图 + MiniMax图像理解的高级网页抓取skill。绕过反爬虫,直接用AI理解截图内容。

Registry SourceRecently Updated
850Profile unavailable
General

Flutter AppStore Doc UI Kit

Generate a complete App Store deliverable pack for a Flutter 3.35.1 app with offline-first design, camera/photo permissions, Apple-style UI mockups, and a sq...

Registry Source
3590Profile unavailable
General

zt-web-fetcher

当需要获取网页内容时使用(搜索结果页面、博客文章、文档等)。使用 URL 转 Markdown 服务将网页转换为可读文本。 触发场景:用户说"帮我查一下"、"看看这个链接"、"获取 xx 的信息"等需要上网抓信息的情况。

Registry Source
2580Profile unavailable
General

jina-ai-reader

Fetch clean, AI-friendly Markdown content from any URL using Jina.ai Reader. Bypasses paywalls, handles Twitter/X posts, renders JavaScript-heavy pages, retu...

Registry Source
1.2K1Profile unavailable