firecrawl

Firecrawl & Jina Web Scraping

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl" with this command: npx skills add tdimino/claude-code-minoan/tdimino-claude-code-minoan-firecrawl

Firecrawl & Jina Web Scraping

Firecrawl vs WebFetch

Prefer firecrawl scrape URL --only-main-content over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.

Preferred approach:

firecrawl scrape https://docs.example.com/api --only-main-content

Token-Efficient Scraping

Inspired by Anthropic's dynamic filtering—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.

The Principle: Search → Filter → Scrape → Filter → Reason

DO:

Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason

DON'T:

Search → Scrape everything → Reason over all of it

Step-by-Step Efficient Workflow

Step 1: Search — get titles/URLs only (cheap)

firecrawl search "query" --limit 20

Step 2: Evaluate results, pick 3-5 best URLs

Step 3: Scrape only those, filter to relevant sections

firecrawl scrape URL1 --only-main-content |
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py
--sections "API,Authentication" --max-chars 5000

Post-Processing with filter_web_results.py

Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:

Extract only matching sections from scraped page

firecrawl scrape URL --only-main-content |
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "Pricing,Plans"

Keep only paragraphs with keywords

firecrawl search "query" --scrape --pretty |
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --keywords "pricing,cost" --max-chars 5000

Extract specific JSON fields from API output

python3 ~/.claude/skills/exa-search/scripts/exa_search.py "query" --json |
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --fields "title,url,text" --max-chars 3000

Combine filters with stats

firecrawl scrape URL --only-main-content |
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py --sections "API" --keywords "endpoint" --compact --stats

Full path: python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py

Flags: --sections , --keywords , --max-chars , --max-lines , --fields (JSON), --strip-links , --strip-images , --compact , --stats

Other Token-Saving Patterns

  • Use --only-main-content to strip navigation and footer boilerplate, reducing token consumption. Omit only when nav/footer content is specifically needed.

  • Use firecrawl map URL --search "topic" first to find relevant subpages before scraping

  • Use --format links first to get URL list, evaluate, then scrape selectively

  • Use --max-chars with exa_contents.py to cap extraction length

  • Use --formats summary (Python API script) over full text when you need the gist, not raw content

Claude API Native Tools (for API Agent Builders)

Anthropic's API now offers built-in dynamic filtering tools:

web_search_20260209 / web_fetch_20260209 Header: anthropic-beta: code-execution-web-tools-2026-02-09

These have built-in dynamic filtering via code execution. Use them when building Claude API agents directly. Use Firecrawl/Exa when you need: autonomous agents, batch scraping, structured extraction, domain-specific crawling, or when not on the Claude API.

Available Tools

  1. Official Firecrawl CLI (firecrawl ) — Primary

Setup: npm install -g firecrawl-cli && firecrawl login --api-key $FIRECRAWL_API_KEY

Command Purpose Quick Example

scrape

Single page → markdown firecrawl scrape URL --only-main-content

crawl

Entire site with progress firecrawl crawl URL --wait --progress --limit 50

map

Discover all URLs on a site firecrawl map URL --search "API"

search

Web search (+ optional scrape) firecrawl search "query" --limit 10

Full CLI reference: references/cli-reference.md

  1. Auto-Save Alias (fc-save ) — Shell Alias

Requires shell alias setup (not bundled with this skill).

fc-save URL

→ Saves to ~/Desktop/Screencaps & Chats/Web-Scrapes/docs-example-com-api.md

  1. Python API Script (firecrawl_api.py ) — Advanced Features

Command: python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py <command>

Requires: FIRECRAWL_API_KEY env var, pip install firecrawl-py requests

Command Purpose Quick Example

search

Web search with scraping firecrawl_api.py search "query" -n 10

scrape

Single URL with page actions firecrawl_api.py scrape URL --formats markdown summary

batch-scrape

Multiple URLs concurrently firecrawl_api.py batch-scrape URL1 URL2 URL3

crawl

Website crawling firecrawl_api.py crawl URL --limit 20

map

URL discovery firecrawl_api.py map URL --search "query"

extract

LLM-powered structured extraction firecrawl_api.py extract URL --prompt "Find pricing"

agent

Autonomous extraction (no URLs needed) firecrawl_api.py agent "Find YC W24 AI startups"

parallel-agent

Bulk agent queries (v2.8.0+) firecrawl_api.py parallel-agent "Q1" "Q2" "Q3"

Agent models: spark-1-fast (10 credits, simple), spark-1-mini (default), spark-1-pro (thorough)

Full Python API reference: references/python-api-reference.md

  1. DeepWiki — GitHub Repo Documentation

~/.claude/skills/firecrawl/scripts/deepwiki.sh <owner/repo> [section] [options]

AI-generated wiki for any public GitHub repo. No API key required.

Overview

~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat

Browse sections

~/.claude/skills/firecrawl/scripts/deepwiki.sh langchain-ai/langchain --toc

Specific section

~/.claude/skills/firecrawl/scripts/deepwiki.sh karpathy/nanochat 4.1-gpt-transformer-implementation

Full dump for RAG

~/.claude/skills/firecrawl/scripts/deepwiki.sh openai/openai-python --all --save

  1. Jina Reader (jina ) — Fallback

Use when Firecrawl fails or for Twitter/X URLs (Firecrawl blocks Twitter, Jina works).

jina https://x.com/username/status/123456

Firecrawl vs Exa vs Native Claude Tools

Need Best Tool Why

Single page → markdown firecrawl scrape --only-main-content

Cleanest output

Search + scrape in one shot firecrawl search --scrape

Combined operation

Crawl entire site firecrawl crawl --wait --progress

Link following + progress

Autonomous data finding firecrawl_api.py agent

No URLs needed

Semantic/neural search Exa exa_search.py

AI-powered relevance

Find research papers Exa --category "research paper"

Academic index

Quick research answer Exa exa_research.py

Citations + synthesis

Find similar pages Exa exa_similar.py

Competitive analysis

Claude API agent building Native web_search_20260209

Built-in dynamic filtering

Twitter/X content jina URL

Only tool that works

GitHub repo docs deepwiki.sh owner/repo

AI-generated wiki

Anti-bot / Cloudflare bypass scrapling stealth fetch Local Turnstile solver

Element-level extraction scrapling

  • CSS selectors Precision targeting, adaptive tracking

No API key scraping scrapling HTTP fetch 100% local, no credentials

Site redesign resilience scrapling adaptive mode SQLite similarity matching

Common Workflows

Single Page Scraping

firecrawl scrape https://example.com/page --only-main-content

Or auto-save: fc-save URL

Or to file: firecrawl scrape URL --only-main-content -o page.md

Documentation Crawling

Map first, then crawl relevant paths

firecrawl map https://docs.example.com --search "API" firecrawl crawl https://docs.example.com --include-paths /api,/guides --wait --progress

Research Workflow

firecrawl search "machine learning best practices 2026" --scrape --scrape-formats markdown

Agent-Powered Research (No URLs Needed)

python3 ~/.claude/skills/firecrawl/scripts/firecrawl_api.py agent
"Compare pricing tiers for Firecrawl, Apify, and ScrapingBee"

Troubleshooting

Check status and credits

firecrawl --status && firecrawl credit-usage

Re-authenticate

firecrawl logout && firecrawl login --api-key $FIRECRAWL_API_KEY

Check API key

echo $FIRECRAWL_API_KEY

  • Scrape fails: Try jina URL , or add --wait-for 3000 for JS-heavy sites

  • Async job stuck: Check with crawl-status /batch-status , cancel with crawl-cancel /batch-cancel

  • Disable telemetry: export FIRECRAWL_NO_TELEMETRY=1

Reference Documentation

File Contents

references/cli-reference.md

Full CLI parameter reference (scrape, crawl, map, search, fc-save, jina, deepwiki)

references/python-api-reference.md

Full Python API script reference (all commands, SDK examples)

references/firecrawl-api.md

Firecrawl Search API reference

references/firecrawl-agent-api.md

Agent API (spark models, parallel agents, webhooks)

references/actions-reference.md

Page actions for dynamic content (click, write, wait, scroll)

references/branding-format.md

Brand identity extraction (colors, fonts, UI)

Test Suite

python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --quick # Quick validation python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py # Full suite python3 ~/.claude/skills/firecrawl/scripts/test_firecrawl.py --test scrape # Specific test

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

travel-requirements-expert

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

twilio-api

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

twitter

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

academic-research

No summary provided by upstream source.

Repository SourceNeeds Review