crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawler" with this command: npx skills add bytesagain3/crawler

Crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations. No API keys or credentials required — outputs reference documentation only.

Commands

Command	Description
`intro`	Crawling vs scraping, robots.txt, sitemap
`standards`	HTTP caching, structured data, meta tags
`troubleshooting`	Anti-bot detection, JS rendering, encoding
`performance`	Concurrency, dedup, incremental, distributed
`security`	Legal landscape, ethical guidelines, proxies
`migration`	BeautifulSoup to Scrapy, requests to Playwright
`cheatsheet`	Scrapy commands, CSS/XPath, curl, user-agents
`faq`	Legality, JS pages, blocking, storage

Output Format

All commands output plain-text reference documentation via heredoc. No external API calls, no credentials needed, no network access.

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

General

Web Scraper - Firecrawl

Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrap...

Registry SourceRecently Updated

1590Profile unavailable

Coding

Skrape

Ethical web data extraction with robots exclusion protocol adherence, throttled scraping requests, and privacy-compliant handling ("Scrape responsibly!").

Registry SourceRecently Updated

2860Profile unavailable

Automation

AutoClaw Browser Automation

Complete browser automation skill with MCP protocol support and Chrome extension

Registry SourceRecently Updated

1.1K0Profile unavailable

Coding

God of all Browsers

A 100x smarter browser automation CLI that mimics human behavior using a native stateful Chromium instance. It supports multi-tab management, bypasses bot de...

Registry SourceRecently Updated

2820Profile unavailable