sitemap-generator

Generate XML sitemaps by crawling a website or scanning local files. Auto-discovers pages via link extraction. Supports local HTML/MD file scanning with lastmod dates. Generates robots.txt with sitemap reference. Use when asked to create a sitemap, generate sitemap.xml, crawl a site for pages, create robots.txt, or prepare a site for SEO. Triggers on "sitemap", "sitemap.xml", "crawl site", "site map", "robots.txt", "SEO sitemap".

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sitemap-generator" with this command: npx skills add charlie-morrison/cm-sitemap-generator

Sitemap Generator

Generate XML sitemaps by crawling a live website or scanning local HTML files.

Crawl a Website

python3 scripts/sitemap_gen.py https://example.com

Scan Local Files

python3 scripts/sitemap_gen.py --local ./public --base-url https://example.com

Save to File

# Save sitemap.xml
python3 scripts/sitemap_gen.py https://example.com --output sitemap.xml

# Save sitemap.xml + robots.txt
python3 scripts/sitemap_gen.py https://example.com --output sitemap.xml --robots

Output Formats

# XML (default — valid sitemap.xml)
python3 scripts/sitemap_gen.py https://example.com

# Text (human-readable summary + XML)
python3 scripts/sitemap_gen.py https://example.com --format text

# JSON (pages list + XML string)
python3 scripts/sitemap_gen.py https://example.com --format json

Options

FlagDefaultDescription
--max-pages500Maximum pages to crawl
--timeout10Request timeout in seconds
--output / -ostdoutSave sitemap.xml to file
--robotsoffAlso generate robots.txt
--localoffScan local directory instead of crawling
--base-urlBase URL for local mode (required)
--verbose / -voffShow crawl progress

Features

  • Crawl mode: BFS link discovery, same-domain only, deduplication
  • Local mode: Scan HTML/HTM/MD/PHP files, auto-detect lastmod from file mtime
  • Smart filtering: Skips images, CSS, JS, PDFs, archives, media files
  • URL normalization: Removes fragments, normalizes trailing slashes
  • robots.txt generation: User-agent + Allow + Sitemap reference
  • Valid XML: Proper XML escaping, sitemaps.org schema

Requirements

  • Python 3.6+
  • No external dependencies (stdlib only)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

test

中文待办事项管理。添加/完成/查看待办,优先级排序。 本地存储,无需账号。 当用户说"待办"、"todo"、"任务清单"、"今天要做什么"时触发。 Keywords: 待办, todo, 任务, 清单, task, checklist.

Registry SourceRecently Updated
General

Detroit

美国底特律——汽车城。涵盖汽车工业兴衰、摩城音乐、城市破产与复兴、科技转型及非裔美国文化

Registry SourceRecently Updated
General

OwnPen - 亲笔写的

Spots AI patterns in Chinese writing and rewrites to match your platform. Works for WeChat articles, Zhihu, Xiaohongshu, work reports, and WeChat Moments. 找出...

Registry SourceRecently Updated
General

Pa Pack

Personal Assistant Pack — Curated toolkit for Google Workspace business owners. Includes installation guides for 5 real tools (gog, things-mac, notion, healt...

Registry SourceRecently Updated