crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawler" with this command: npx skills add bytesagain3/crawler

Crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations. No API keys or credentials required — outputs reference documentation only.

Commands

CommandDescription
introCrawling vs scraping, robots.txt, sitemap
standardsHTTP caching, structured data, meta tags
troubleshootingAnti-bot detection, JS rendering, encoding
performanceConcurrency, dedup, incremental, distributed
securityLegal landscape, ethical guidelines, proxies
migrationBeautifulSoup to Scrapy, requests to Playwright
cheatsheetScrapy commands, CSS/XPath, curl, user-agents
faqLegality, JS pages, blocking, storage

Output Format

All commands output plain-text reference documentation via heredoc. No external API calls, no credentials needed, no network access.


Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Web Scraper - Firecrawl

Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrap...

Registry SourceRecently Updated
1590Profile unavailable
Coding

Skrape

Ethical web data extraction with robots exclusion protocol adherence, throttled scraping requests, and privacy-compliant handling ("Scrape responsibly!").

Registry SourceRecently Updated
2860Profile unavailable
Automation

AutoClaw Browser Automation

Complete browser automation skill with MCP protocol support and Chrome extension

Registry SourceRecently Updated
1.1K0Profile unavailable
Coding

God of all Browsers

A 100x smarter browser automation CLI that mimics human behavior using a native stateful Chromium instance. It supports multi-tab management, bypasses bot de...

Registry SourceRecently Updated
2820Profile unavailable