xcrawl-scrape

Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "xcrawl-scrape" with this command: npx skills add wykings/xcrawl-scrape

XCrawl Scrape

Overview

This skill handles single-page extraction with XCrawl Scrape APIs. Default behavior is raw passthrough: return upstream API response bodies as-is.

Required Local Config

Before using this skill, the user must create a local config file and write XCRAWL_API_KEY into it.

Path: ~/.xcrawl/config.json

{
  "XCRAWL_API_KEY": "<your_api_key>"
}

Read API key from local config file only. Do not require global environment variables.

Credits and Account Setup

Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at https://dash.xcrawl.com/. After registration, they can activate the free 1000 credits plan before running requests.

Tool Permission Policy

Request runtime permissions for curl and node only. Do not request Python, shell helper scripts, or other runtime permissions.

API Surface

  • Start scrape: POST /v1/scrape
  • Read async result: GET /v1/scrape/{scrape_id}
  • Base URL: https://run.xcrawl.com
  • Required header: Authorization: Bearer <XCRAWL_API_KEY>

Usage Examples

cURL (sync)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'

cURL (async create + result)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')"

echo "$CREATE_RESP"

SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")"

curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \
  -H "Authorization: Bearer ${API_KEY}"

Node

node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}};
fetch("https://run.xcrawl.com/v1/scrape",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'

Request Parameters

Request endpoint and headers

  • Endpoint: POST https://run.xcrawl.com/v1/scrape
  • Headers:
  • Content-Type: application/json
  • Authorization: Bearer <api_key>

Request body: top-level fields

FieldTypeRequiredDefaultDescription
urlstringYes-Target URL
modestringNosyncsync or async
proxyobjectNo-Proxy config
requestobjectNo-Request config
js_renderobjectNo-JS rendering config
outputobjectNo-Output config
webhookobjectNo-Async webhook config (mode=async)

proxy

FieldTypeRequiredDefaultDescription
locationstringNoUSISO-3166-1 alpha-2 country code, e.g. US / JP / SG
sticky_sessionstringNoAuto-generatedSticky session ID; same ID attempts to reuse exit

request

FieldTypeRequiredDefaultDescription
localestringNoen-US,en;q=0.9Affects Accept-Language
devicestringNodesktopdesktop / mobile; affects UA and viewport
cookiesobject mapNo-Cookie key/value pairs
headersobject mapNo-Header key/value pairs
only_main_contentbooleanNotrueReturn main content only
block_adsbooleanNotrueAttempt to block ad resources
skip_tls_verificationbooleanNotrueSkip TLS verification

js_render

FieldTypeRequiredDefaultDescription
enabledbooleanNotrueEnable browser rendering
wait_untilstringNoloadload / domcontentloaded / networkidle
viewport.widthintegerNo-Viewport width (desktop 1920, mobile 402)
viewport.heightintegerNo-Viewport height (desktop 1080, mobile 874)

output

FieldTypeRequiredDefaultDescription
formatsstring[]No["markdown"]Output formats
screenshotstringNoviewportfull_page / viewport (only if formats includes screenshot)
json.promptstringNo-Extraction prompt
json.json_schemaobjectNo-JSON Schema

output.formats enum:

  • html
  • raw_html
  • markdown
  • links
  • summary
  • screenshot
  • json

webhook

FieldTypeRequiredDefaultDescription
urlstringNo-Callback URL
headersobject mapNo-Custom callback headers
eventsstring[]No["started","completed","failed"]Events: started / completed / failed

Response Parameters

Sync create response (mode=sync)

FieldTypeDescription
scrape_idstringTask ID
endpointstringAlways scrape
versionstringVersion
statusstringcompleted / failed
urlstringTarget URL
dataobjectResult data
started_atstringStart time (ISO 8601)
ended_atstringEnd time (ISO 8601)
total_credits_usedintegerTotal credits used

data fields (based on output.formats):

  • html, raw_html, markdown, links, summary, screenshot, json
  • metadata (page metadata)
  • traffic_bytes
  • credits_used
  • credits_detail

credits_detail fields:

FieldTypeDescription
base_costintegerBase scrape cost
traffic_costintegerTraffic cost
json_extract_costintegerJSON extraction cost

Async create response (mode=async)

FieldTypeDescription
scrape_idstringTask ID
endpointstringAlways scrape
versionstringVersion
statusstringAlways pending

Async result response (GET /v1/scrape/{scrape_id})

FieldTypeDescription
scrape_idstringTask ID
endpointstringAlways scrape
versionstringVersion
statusstringpending / crawling / completed / failed
urlstringTarget URL
dataobjectSame shape as sync data
started_atstringStart time (ISO 8601)
ended_atstringEnd time (ISO 8601)

Workflow

  1. Restate the user goal as an extraction contract.
  • URL scope, required fields, accepted nulls, and precision expectations.
  1. Build the scrape request body.
  • Keep only necessary options.
  • Prefer explicit output.formats.
  1. Execute scrape and capture task metadata.
  • Track scrape_id, status, and timestamps.
  • If async, poll until completed or failed.
  1. Return raw API responses directly.
  • Do not synthesize or compress fields by default.

Output Contract

Return:

  • Endpoint(s) used and mode (sync or async)
  • request_payload used for the request
  • Raw response body from each API call
  • Error details when request fails

Do not generate summaries unless the user explicitly requests a summary.

Guardrails

  • Do not invent unsupported output fields.
  • Do not hardcode provider-specific tool schemas in core logic.
  • Call out uncertainty when page structure is unstable.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

qwencloud-model-selector

[QwenCloud] Recommend the best Qwen model and parameters. TRIGGER when: choosing between Qwen models, comparing Qwen model pricing, understanding Qwen model...

Registry SourceRecently Updated
General

deployment-manager

You are a deployment manager with expertise in release orchestration, deployment strategies, and production reliability. Use when: release orchestration and...

Registry SourceRecently Updated
General

Hk Stock Morning Report

Generate HK stock market morning report (股市晨報) for bank trading desks. Triggers: "生成晨报", "股市晨报", "今日股市", "港股晨報" 報告結構(5部分): 1. 市場回顧(恒指/科指/國指 + 強弱勢股) 2. 南下資金(總...

Registry SourceRecently Updated
General

Story Long Scan

长篇网文扫榜。分析起点、番茄、晋江等平台排行榜数据,提炼市场趋势与热门题材。 触发方式:/story-long-scan、/长篇扫榜、「长篇什么火」「起点排行」

Registry SourceRecently Updated