opencrawl

Crawl any JavaScript-rendered webpage through distributed real Chrome browsers. No local browser needed — perfect for headless VPS.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "opencrawl" with this command: npx skills add hlyylly/chromeopencrawl

OpenCrawl Skill

Use this skill to crawl any JavaScript-rendered webpage using real Chrome browsers from a distributed worker pool. Unlike headless browser solutions (Puppeteer/Playwright), OpenCrawl requires zero local browser installation — ideal for VPS and cloud environments.

Quick Start (use our public server)

  1. Visit http://39.105.206.76:9877 and click "Register" to get a free API Key (100 credits included)
  2. Set environment variables:
    OPENCRAWL_API_KEY=ak_your_key_here
    OPENCRAWL_API_URL=http://39.105.206.76:9877
    
  3. Start crawling!

Self-hosted (deploy your own server)

If you prefer to run your own OpenCrawl server, see the full deployment guide: https://github.com/hlyylly/OpenCrawl

Then set OPENCRAWL_API_URL to your own server address.


How it works: Your request → OpenCrawl server → dispatched to a real Chrome browser worker → page rendered with full JavaScript → content extracted → uploaded to Cloudflare R2 → download URL returned to you.

Errors: On failure the script writes a JSON error to stderr and exits with code 1.


Tools

1. Crawl Page

Use this to get the full rendered text content of any webpage, including JavaScript-rendered content that simple HTTP requests cannot retrieve.

Command:

python3 {baseDir}/tools/crawl.py --url "https://example.com"

Examples:

# Crawl a full page
python3 {baseDir}/tools/crawl.py --url "https://www.smzdm.com/p/170177008/"

# Crawl with CSS selector to extract specific content
python3 {baseDir}/tools/crawl.py --url "https://example.com" --selector ".article-content"

# Output raw JSON response (includes downloadUrl)
python3 {baseDir}/tools/crawl.py --url "https://example.com" --raw

Optional flags:

  • --selector ".css-selector" — Extract only matching elements
  • --mode lite — Lite mode: no images/CSS, faster, 0.1 credit (default: full)
  • --raw — Output full JSON response instead of just the text content
  • --timeout 60 — Custom timeout in seconds (default: 60)

2. Search (Brave Search API Compatible)

Use this to search the web using multiple search engines (DuckDuckGo + Google + Bing + Baidu) through real Chrome browsers. Returns structured results compatible with Brave Search API format.

Command:

python3 {baseDir}/tools/crawl.py --search "your search query"

Examples:

# Lite search — DuckDuckGo only (0.1 credit)
python3 {baseDir}/tools/crawl.py --search "python web scraping"

# Full search — 4 engines parallel (3 credits, 20-30 deduplicated results)
python3 {baseDir}/tools/crawl.py --search "python web scraping" --mode full

4. Check Balance

Use this to check how many credits remain on the API key.

Command:

python3 {baseDir}/tools/crawl.py --balance

5. Check Status

Use this to check the OpenCrawl platform status — how many workers are online, tasks completed, etc.

Command:

python3 {baseDir}/tools/crawl.py --status

Summary

ActionArgumentExample
Crawl (full)--urlpython3 {baseDir}/tools/crawl.py --url "https://example.com"
Crawl (lite)--url --mode litepython3 {baseDir}/tools/crawl.py --url "https://example.com" --mode lite
Search (lite)--searchpython3 {baseDir}/tools/crawl.py --search "python tutorial"
Search (full)--search --mode fullpython3 {baseDir}/tools/crawl.py --search "python tutorial" --mode full
Check balance--balancepython3 {baseDir}/tools/crawl.py --balance
Check status--statuspython3 {baseDir}/tools/crawl.py --status

Output: Crawl → rendered page text (or JSON with --raw). Search → JSON with web.results[] (Brave compatible). Balance → JSON. Status → JSON.

Requirements: Python 3.8+, requests library. No browser installation needed.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

AIWolfPK - AI狼人杀

四个AI互相猜疑,你坐着看戏。每局30秒,到底谁是狼? Four AIs play Werewolf while you watch. 30s per round. Spot the wolf before they do.

Registry SourceRecently Updated
General

Project Analyzer

Analyze any project directory and produce a detailed report covering what the project does, its tech stack, folder structure, entry points, how to run it, an...

Registry SourceRecently Updated
General

Thought-Retriever

提炼对话回答中的核心洞察为高置信度知识晶体,存储于本体驱动记忆系统的自我进化与复用。

Registry SourceRecently Updated
General

Miaoji Bid Guard Pro

亚马逊广告护城河Pro版,90天ROI预测+多活动协同+季节性出价+关键词攻防矩阵。 从单次调价建议升级为完整的广告战役规划。基础功能可使用 miaoji-bid-guard 免费版。

Registry SourceRecently Updated