website-crawler

High-performance web crawler with TypeScript/Bun frontend and Go backend for discovering and mapping website structure.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "website-crawler" with this command: npx skills add leobrival/serum-plugins-official/leobrival-serum-plugins-official-website-crawler

Website Crawler

High-performance web crawler with TypeScript/Bun frontend and Go backend for discovering and mapping website structure.

When to Use

Use this skill when users ask to:

  • Crawl a website or "spider a site"

  • Map site structure or "discover all pages"

  • Find all URLs on a website

  • Generate sitemap or site report

  • Analyze link relationships between pages

  • Audit website coverage or completeness

  • Extract page metadata (titles, status codes)

Keywords: crawl, spider, map, discover pages, site structure, sitemap, all URLs, website audit

Quick Start

Run the crawler from the scripts directory:

cd ~/.claude/scripts/crawler bun src/index.ts <URL> [options]

CLI Options

Option Short Default Description

--depth

-D

2 Maximum crawl depth

--workers

-w

20 Concurrent workers

--rate

-r

2 Rate limit (requests/second)

--profile

-p

Use preset profile (fast/deep/gentle)

--output

-o

auto Output directory

--sitemap

-s

true Use sitemap.xml for discovery

--domain

-d

auto Allowed domain (extracted from URL)

--debug

false Enable debug logging

Profiles

Three preset profiles for common use cases:

Profile Workers Depth Rate Use Case

fast

50 3 10 Quick site mapping

deep

20 10 3 Thorough crawling

gentle

5 5 1 Respect server limits

Usage Examples

Basic crawl

bun src/index.ts https://example.com

Deep crawl with high concurrency

bun src/index.ts https://example.com --depth 5 --workers 30 --rate 5

Using a profile

bun src/index.ts https://example.com --profile fast

Gentle crawl (avoid rate limiting)

bun src/index.ts https://example.com --profile gentle

Output

The crawler generates two files in the output directory:

  • results.json - Structured crawl data with all discovered pages

  • index.html - Dark-themed HTML report with statistics

Results JSON Structure

{ "stats": { "pages_found": 150, "pages_crawled": 147, "external_links": 23, "errors": 3, "duration": 45.2 }, "results": [ { "url": "https://example.com/page", "title": "Page Title", "status_code": 200, "depth": 1, "links": ["..."], "content_type": "text/html" } ] }

Features

  • Sitemap Discovery: Automatically finds and parses sitemap.xml

  • Checkpoint/Resume: Auto-saves progress every 30 seconds

  • Rate Limiting: Token bucket algorithm prevents server overload

  • Concurrent Crawling: Go worker pool for high performance

  • HTML Reports: Dark-themed, mobile-responsive reports

Troubleshooting

Rate limiting errors

Reduce the rate limit or use the gentle profile:

bun src/index.ts <url> --rate 1

or

bun src/index.ts <url> --profile gentle

Go binary not found

The TypeScript frontend auto-compiles the Go binary. If compilation fails:

cd ~/.claude/scripts/crawler/engine go build -o crawler main.go

Timeout on large sites

Reduce depth or increase workers:

bun src/index.ts <url> --depth 1 --workers 50

Architecture

For detailed architecture, Go engine specifications, and code conventions, see reference.md.

Related Files

  • Command: plugins/crawler/commands/crawler.md

  • Reference: plugins/crawler/skills/website-crawler/reference.md

  • Scripts: plugins/crawler/skills/website-crawler/scripts/

  • Profiles: plugins/crawler/skills/website-crawler/scripts/config/profiles/

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

web-crawler

No summary provided by upstream source.

Repository SourceNeeds Review
General

image-processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

media-processor

No summary provided by upstream source.

Repository SourceNeeds Review