robots-txt-gen

Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if a URL is allowed/blocked by robots.txt rules, or generating robots.txt for common platforms (WordPress, Next.js, Django, Rails). Also use when auditing crawl directives or debugging search engine indexing issues.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "robots-txt-gen" with this command: npx skills add johnnywang2001/robots-txt-gen

robots-txt-gen

Generate, validate, and test robots.txt files from the command line.

Quick Start

# Generate a robots.txt for a platform
python3 scripts/robots_txt_gen.py generate --preset nextjs --sitemap https://example.com/sitemap.xml

# Validate an existing robots.txt
python3 scripts/robots_txt_gen.py validate --file robots.txt

# Validate a remote robots.txt
python3 scripts/robots_txt_gen.py validate --url https://example.com/robots.txt

# Test if a URL is allowed for a user-agent
python3 scripts/robots_txt_gen.py test --file robots.txt --url /admin/dashboard --agent Googlebot

# Generate with custom rules
python3 scripts/robots_txt_gen.py generate --allow "/" --disallow "/admin" --disallow "/api" --disallow "/private" --sitemap https://example.com/sitemap.xml --agent "*"

Commands

generate

Create a robots.txt file with custom rules or platform presets.

Options:

  • --preset <name> — Use a platform preset: wordpress, nextjs, django, rails, laravel, static, spa, ecommerce
  • --agent <name> — User-agent (default: *). Repeat for multiple agents.
  • --allow <path> — Allow path. Repeatable.
  • --disallow <path> — Disallow path. Repeatable.
  • --sitemap <url> — Sitemap URL. Repeatable.
  • --crawl-delay <seconds> — Crawl delay directive.
  • --block-ai — Add rules to block common AI crawlers (GPTBot, ChatGPT-User, CCBot, Google-Extended, anthropic-ai, etc.)
  • --output <file> — Write to file instead of stdout.

validate

Check a robots.txt file for syntax errors and best-practice warnings.

Options:

  • --file <path> — Local file to validate.
  • --url <url> — Remote robots.txt URL to fetch and validate.

test

Test whether a specific URL path is allowed or disallowed for a given user-agent.

Options:

  • --file <path> — robots.txt file to test against.
  • --url <path> — URL path to test (e.g., /admin/login).
  • --agent <name> — User-agent to test as (default: Googlebot).

Platform Presets

PresetWhat it blocksNotes
wordpress/wp-admin/, /wp-includes/, query paramsAllows /wp-admin/admin-ajax.php
nextjs/_next/static/, /api/, /.next/Standard Next.js paths
django/admin/, /static/admin/, /media/private/Django admin and private media
rails/admin/, /assets/, /tmp/Rails conventions
laravel/admin/, /storage/, /vendor/Laravel conventions
staticNothing blockedSimple allow-all with sitemap
spa/api/, /assets/Single-page app pattern
ecommerce/cart/, /checkout/, /account/, /search?Prevents crawling user sessions

AI Crawler Blocking

The --block-ai flag adds disallow rules for known AI training crawlers:

  • GPTBot, ChatGPT-User (OpenAI)
  • Google-Extended (Google AI)
  • CCBot (Common Crawl)
  • anthropic-ai (Anthropic)
  • Bytespider (ByteDance)
  • ClaudeBot (Anthropic)
  • FacebookBot (Meta)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

UXLens

Audit websites for UX, UI, and accessibility issues with 600+ checkpoints, offering detailed reports, full site crawls, and redesign comparisons.

Registry SourceRecently Updated
1361Profile unavailable
Coding

Wordpress AEO Autoblogger

Autonomous AEO and SEO content generation and optimization engine for scaling business operations. Use when Codex needs to run end-to-end programmatic SEO wo...

Registry SourceRecently Updated
770Profile unavailable
Security

Seo Reporter

Audit a URL for SEO factors and generate an actionable markdown report. Use when asked to analyze, check, or audit a webpage's SEO performance. Covers title,...

Registry SourceRecently Updated
1590Profile unavailable
General

Dead Link Scanner

Scan websites, markdown files, and HTML files for broken links (dead links). Use when checking a website for 404s, validating links in documentation or READM...

Registry SourceRecently Updated
2220Profile unavailable