firecrawl-scraper

Firecrawl Scraper Skill

Trigger Conditions & Endpoint Selection

Choose Firecrawl endpoint based on user intent:

scrape: Need to extract content from a single web page (markdown, html, json, screenshot, pdf)
crawl: Need to crawl entire website with depth control and path filtering
map: Need to quickly get a list of all URLs on a website
batch-scrape: Need to scrape multiple URLs in parallel
crawl-status: Given crawl job ID, check crawl progress/results (optional --wait )

Recommended Architecture (Main Skill + Sub-skill)

This skill uses a two-phase architecture:

Main skill (current context): Understand user question → Choose endpoint → Assemble JSON payload
Sub-skill (fork context): Only responsible for HTTP call execution, avoiding conversation history token waste

Execution Method

Use Task tool to invoke firecrawl-fetcher sub-skill, passing command and JSON (stdin):

Task parameters:

subagent_type: Bash
description: "Call Firecrawl API"
prompt: cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs <scrape|crawl|map|batch-scrape|crawl-status> [--wait] { ...payload... } JSON

Payload Examples

Scrape Single Page

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com", "formats": ["markdown", "links"], "onlyMainContent": true, "includeTags": [], "excludeTags": ["nav", "footer"], "waitFor": 0, "timeout": 30000 } JSON

Available formats:

"markdown" , "html" , "rawHtml" , "links" , "images" , "summary"
{"type": "json", "prompt": "Extract product info", "schema": {...}}
{"type": "screenshot", "fullPage": true, "quality": 85}

Scrape with Actions (Page Interaction)

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com", "formats": ["markdown"], "actions": [ {"type": "wait", "milliseconds": 2000}, {"type": "click", "selector": "#load-more"}, {"type": "wait", "milliseconds": 1000}, {"type": "scroll", "direction": "down", "amount": 500} ] } JSON

Available actions:

wait , click , write , press , scroll , screenshot , scrape , executeJavascript

Parse PDF

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com/document.pdf", "formats": ["markdown"], "parsers": ["pdf"] } JSON

Extract Structured JSON

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com/product", "formats": [ { "type": "json", "prompt": "Extract product information", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "price": {"type": "number"}, "description": {"type": "string"} }, "required": ["name", "price"] } } ] } JSON

Crawl Entire Website

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl { "url": "https://docs.example.com", "formats": ["markdown"], "includePaths": ["^/docs/."], "excludePaths": ["^/blog/."], "maxDiscoveryDepth": 3, "limit": 100, "allowExternalLinks": false, "allowSubdomains": false } JSON

5.1) Crawl + Wait for Completion

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl --wait { "url": "https://docs.example.com", "formats": ["markdown"], "limit": 100 } JSON

Map Website URLs

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs map { "url": "https://example.com", "search": "documentation", "limit": 5000 } JSON

Batch Scrape Multiple URLs

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs batch-scrape { "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "formats": ["markdown"] } JSON

Check Crawl Status

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id>

Wait for completion:

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id> --wait

Key Features

Formats

markdown: Clean markdown content
html: Parsed HTML
rawHtml: Original HTML
links: All links on page
images: All images on page
summary: AI-generated summary
json: Structured data extraction with schema
screenshot: Page screenshot (PNG)

Content Control

onlyMainContent : Extract only main content (default: true)
includeTags : CSS selectors to include
excludeTags : CSS selectors to exclude
waitFor : Wait time before scraping (ms)
maxAge : Cache duration (default: 48 hours)

Actions (Browser Automation)

wait : Wait for specified time
click : Click element by selector
write : Input text into field
press : Press keyboard key
scroll : Scroll page
executeJavascript : Run custom JS

Crawl Options

includePaths : Regex patterns to include
excludePaths : Regex patterns to exclude
maxDiscoveryDepth : Maximum crawl depth
limit : Maximum pages to crawl
allowExternalLinks : Follow external links
allowSubdomains : Follow subdomains

Environment Variables & API Key

Two ways to configure API Key (priority: environment variable > .env ):

Environment variable: FIRECRAWL_API_KEY
.env file: Place in .claude/skills/firecrawl-scraper/.env , can copy from .env.example

Response Format

All endpoints return JSON with:

success : Boolean indicating success
data : Extracted content (format depends on endpoint)
For crawl: Returns job ID, use crawl-status (or GET /v2/crawl/{id}) to check status