XCrawl Scraper

<p align="center"> <img src="https://img.shields.io/badge/Python-3.8%2B-blue?style=for-the-badge&logo=python" alt="Python"> <img src="https://img.shields.io/badge License-MIT-yellow?style=for-the-badge" alt="License"> </p> <h1 align="center">🕷️ XCrawl Scraper / XCrawl 网页爬虫</h1> <p align="center"> <strong>AI-Powered Web Scraping API - 结构化数据提取利器</strong><br> <em>支持 Markdown、HTML、JSON、Screenshot 等多种格式输出</em> </p>

✨ 功能特点 / Features

功能	说明
🏷️ 网页爬取	支持 Markdown、HTML、JSON、Screenshot
🔍 搜索	搜索引擎结果爬取
🗺️ 网站地图	自动发现站点所有页面
🕷️ 站点爬取	批量爬取整个站点
📊 结构化数据	JSON Schema 自动提取结构化数据
🌐 代理支持	全球代理可选

📦 安装 / Installation

方式一：运行安装脚本

scripts\install.bat

方式二：手动安装

# 1. 安装依赖
pip install xcrawl

# 2. 配置 API Key
python scripts\xcrawl_scraper.py set-key YOUR_API_KEY

⚙️ 获取 API Key

访问 https://xcrawl.com 注册账号
获取 API Key
配置:

python scripts\xcrawl_scraper.py set-key YOUR_API_KEY

📖 使用方法 / Usage

1. 爬取网页 (基本)

python scripts\xcrawl_scraper.py scrape https://example.com markdown

2. 爬取多个格式

python scripts\xcrawl_scraper.py scrape https://example.com markdown html links

3. 结构化数据提取 (JSON)

python scripts\xcrawl_scraper.py scrape https://example.com json "提取产品名称和价格"

4. 搜索

python scripts\xcrawl_scraper.py search "web scraping"

5. 网站地图

python scripts\xcrawl_scraper.py map https://example.com

6. 站点爬取

python scripts\xcrawl_scraper.py crawl https://example.com

📋 命令列表

命令	说明
`scrape <URL> [formats...]`	爬取网页
`search <query>`	搜索
`map <URL>`	网站地图
`crawl <URL>`	站点爬取
`set-key <API_KEY>`	设置 API Key
`config`	显示配置

🔧 配置 / Configuration

配置文件: scripts/config.json

{
  "apiKey": "YOUR_API_KEY",
  "apiUrl": "https://run.xcrawl.com",
  "timeout": 60,
  "defaultFormats": ["markdown"],
  "defaultProxy": ""
}

📝 示例输出

Markdown 输出

# Example Domain

This domain is for use in illustrative examples in documents.

JSON 输出

{
  "product_name": "iPhone 15 Pro",
  "price": 999,
  "currency": "USD"
}

📦 依赖

Python >= 3.8
xcrawl SDK

🤝 贡献 / Contributing

欢迎提交 Issue 和 Pull Request！

📄 许可证

MIT License