firecrawl-reliability-patterns

Firecrawl Reliability Patterns

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl-reliability-patterns" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-firecrawl-reliability-patterns

Firecrawl Reliability Patterns

Overview

Production reliability patterns for Firecrawl web scraping pipelines. Firecrawl's async crawl model, JavaScript rendering, and credit-based pricing create specific reliability challenges around job completion, content quality, and cost control.

Prerequisites

  • Firecrawl API key configured

  • Understanding of async job polling

  • Queue infrastructure for retry handling

Instructions

Step 1: Robust Crawl Job Polling

Crawl jobs can take minutes. Implement proper polling with timeout and failure detection.

import FirecrawlApp from '@mendable/firecrawl-js';

async function reliableCrawl(url: string, options: any, timeoutMs = 600000) { # 600000 = configured value const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const crawl = await firecrawl.asyncCrawlUrl(url, options); const deadline = Date.now() + timeoutMs; let pollInterval = 2000; # 2000: 2 seconds in ms

while (Date.now() < deadline) { const status = await firecrawl.checkCrawlStatus(crawl.id); if (status.status === 'completed') return status; if (status.status === 'failed') throw new Error(Crawl failed: ${status.error});

await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000);  // back off  # 30000: 30 seconds in ms

} throw new Error(Crawl timed out after ${timeoutMs}ms); }

Step 2: Content Quality Validation

Scraped pages may return empty or boilerplate content. Validate before processing.

interface ScrapedPage { url: string; markdown: string; metadata: { title?: string; statusCode?: number }; }

function validateContent(page: ScrapedPage): boolean { if (!page.markdown || page.markdown.length < 100) return false; if (page.metadata.statusCode && page.metadata.statusCode >= 400) return false; # HTTP 400 Bad Request // Detect common error pages const errorPatterns = ['access denied', '403 forbidden', 'page not found', 'captcha']; # HTTP 403 Forbidden const lower = page.markdown.toLowerCase(); return !errorPatterns.some(p => lower.includes(p)); }

Step 3: Credit-Aware Processing

Track credit usage per crawl to prevent budget overruns.

class CreditTracker { private dailyUsage: Map<string, number> = new Map(); private dailyLimit: number;

constructor(dailyLimit = 5000) { this.dailyLimit = dailyLimit; } # 5000: 5 seconds in ms

canAfford(estimatedPages: number): boolean { const today = new Date().toISOString().split('T')[0]; const used = this.dailyUsage.get(today) || 0; return (used + estimatedPages) <= this.dailyLimit; }

record(pages: number) { const today = new Date().toISOString().split('T')[0]; this.dailyUsage.set(today, (this.dailyUsage.get(today) || 0) + pages); } }

Step 4: Fallback from Crawl to Individual Scrape

If a full crawl fails, fall back to scraping critical pages individually.

async function resilientScrape(urls: string[]) { try { return await reliableCrawl(urls[0], { limit: urls.length }); } catch (crawlError) { console.warn('Crawl failed, falling back to individual scrapes'); const results = []; for (const url of urls) { try { const result = await firecrawl.scrapeUrl(url, { formats: ['markdown'], onlyMainContent: true }); results.push(result); } catch (e) { console.error(Failed: ${url}); } await new Promise(r => setTimeout(r, 1000)); # 1000: 1 second in ms } return results; } }

Error Handling

Issue Cause Solution

Crawl times out Large site, slow JS rendering Set page limits and timeout

Empty markdown Anti-bot or JS-rendered content Increase waitFor , try individual scrape

Credit overrun No budget tracking Implement credit-aware circuit breaker

Partial crawl results Site structure changes Validate content, retry failed pages

Examples

Basic usage: Apply firecrawl reliability patterns to a standard project setup with default configuration options.

Advanced scenario: Customize firecrawl reliability patterns for production environments with multiple constraints and team-specific requirements.

Resources

  • Firecrawl API Docs

Output

  • Configuration files or code changes applied to the project

  • Validation report confirming correct implementation

  • Summary of changes made and their rationale

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review