apify

Run Apify Actors (web scrapers, crawlers, automation tools) and retrieve their results using the Apify REST API with curl. Use when the user wants to scrape a website, extract data from the web, run an Apify Actor, crawl pages, or get results from Apify datasets.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "apify" with this command: npx skills add bmestanov/apify

Apify

Run any of the 17,000+ Actors on Apify Store and retrieve structured results via the REST API.

Full OpenAPI spec: openapi.json

Authentication

All requests need the APIFY_TOKEN env var. Use it as a Bearer token:

-H "Authorization: Bearer $APIFY_TOKEN"

Base URL: https://api.apify.com

Core workflow

1. Find the right Actor

Search the Apify Store by keyword:

curl -s "https://api.apify.com/v2/store?search=web+scraper&limit=5" \
  -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data.items[] | {name: (.username + "/" + .name), title, description}'

Actors are identified by username~name (tilde) in API paths, e.g. apify~web-scraper.

2. Get Actor README and input schema

Before running an Actor, fetch its default build to get the README (usage docs) and input schema (expected JSON fields):

curl -s "https://api.apify.com/v2/acts/apify~web-scraper/builds/default" \
  -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data | {readme, inputSchema}'

inputSchema is a JSON-stringified object — parse it to see required/optional fields, types, defaults, and descriptions. Use this to construct valid input for the run.

You can also get the Actor's per-build OpenAPI spec (no auth required):

curl -s "https://api.apify.com/v2/acts/apify~web-scraper/builds/default/openapi.json"

3. Run an Actor (async — recommended for most cases)

Start the Actor and get the run object back immediately:

curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":10}'

Response includes data.id (run ID), data.defaultDatasetId, data.status.

Optional query params: ?timeout=300&memory=4096&maxItems=100&waitForFinish=60

  • waitForFinish (0-60): seconds the API waits before returning. Useful to avoid polling for short runs.

4. Poll run status

curl -s "https://api.apify.com/v2/actor-runs/RUN_ID?waitForFinish=60" \
  -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data | {status, defaultDatasetId}'

Terminal statuses: SUCCEEDED, FAILED, ABORTED, TIMED-OUT.

5. Get results

Dataset items (most common — structured scraped data):

curl -s "https://api.apify.com/v2/datasets/DATASET_ID/items?clean=true&limit=100" \
  -H "Authorization: Bearer $APIFY_TOKEN"

Or directly from the run (shortcut — same parameters):

curl -s "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items?clean=true&limit=100" \
  -H "Authorization: Bearer $APIFY_TOKEN"

Params: format (json|csv|jsonl|xml|xlsx|rss), fields, omit, limit, offset, clean, desc.

Key-value store record (screenshots, HTML, OUTPUT):

curl -s "https://api.apify.com/v2/key-value-stores/STORE_ID/records/OUTPUT" \
  -H "Authorization: Bearer $APIFY_TOKEN"

Run log:

curl -s "https://api.apify.com/v2/logs/RUN_ID" \
  -H "Authorization: Bearer $APIFY_TOKEN"

6. Run Actor synchronously (short-running Actors only)

For Actors that finish within 300 seconds, get dataset items in one call:

curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items?timeout=120" \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":5}'

Returns the dataset items array directly (not wrapped in data). Returns 408 if the run exceeds 300s.

Alternative: /run-sync returns the KVS OUTPUT record instead of dataset items.

Quick recipes

Scrape a website

curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items?timeout=120" \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":20}'

Google search

curl -s -X POST "https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?timeout=120" \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"queries":"site:example.com openai","maxPagesPerQuery":1}'

Long-running Actor (async with polling)

# 1. Start
RUN=$(curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs?waitForFinish=60" \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":500}')
RUN_ID=$(echo "$RUN" | jq -r '.data.id')

# 2. Poll until done
while true; do
  STATUS=$(curl -s "https://api.apify.com/v2/actor-runs/$RUN_ID?waitForFinish=60" \
    -H "Authorization: Bearer $APIFY_TOKEN" | jq -r '.data.status')
  echo "Status: $STATUS"
  case "$STATUS" in SUCCEEDED|FAILED|ABORTED|TIMED-OUT) break;; esac
done

# 3. Fetch results
curl -s "https://api.apify.com/v2/actor-runs/$RUN_ID/dataset/items?clean=true" \
  -H "Authorization: Bearer $APIFY_TOKEN"

Abort a run

curl -s -X POST "https://api.apify.com/v2/actor-runs/RUN_ID/abort" \
  -H "Authorization: Bearer $APIFY_TOKEN"

Paid / rental Actors

Some Actors require a monthly subscription before they can be run. If the API returns a permissions or payment error for an Actor, ask the user to manually subscribe via the Apify Console:

https://console.apify.com/actors/ACTOR_ID

Replace ACTOR_ID with the Actor's ID (e.g. AhEsMsQyLfHyMLaxz). The user needs to click Start on that page to activate the subscription. Most rental Actors offer a free trial period set by the developer.

You can get the Actor ID from the store search response (data.items[].id) or from GET /v2/acts/username~name (data.id).

Error handling

  • 401: APIFY_TOKEN missing or invalid.
  • 404 Actor not found: check username~name format (tilde, not slash). Browse https://apify.com/store.
  • 400 run-failed: check GET /v2/logs/RUN_ID for details.
  • 402/403 payment required: the Actor likely requires a subscription. See "Paid / rental Actors" above.
  • 408 run-timeout-exceeded: sync endpoints have a 300s limit. Use async workflow instead.
  • 429 rate-limit-exceeded: retry with exponential backoff (start at 500ms, double each time).

Additional resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

AIWolfPK - AI狼人杀

四个AI互相猜疑,你坐着看戏。每局30秒,到底谁是狼? Four AIs play Werewolf while you watch. 30s per round. Spot the wolf before they do.

Registry SourceRecently Updated
General

Project Analyzer

Analyze any project directory and produce a detailed report covering what the project does, its tech stack, folder structure, entry points, how to run it, an...

Registry SourceRecently Updated
General

Thought-Retriever

提炼对话回答中的核心洞察为高置信度知识晶体,存储于本体驱动记忆系统的自我进化与复用。

Registry SourceRecently Updated
General

Miaoji Bid Guard Pro

亚马逊广告护城河Pro版,90天ROI预测+多活动协同+季节性出价+关键词攻防矩阵。 从单次调价建议升级为完整的广告战役规划。基础功能可使用 miaoji-bid-guard 免费版。

Registry SourceRecently Updated