llms-txt-crawler

llms.txt Crawler Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "llms-txt-crawler" with this command: npx skills add agykit/agykit/agykit-agykit-llms-txt-crawler

llms.txt Crawler Skill

This skill enables you to fetch llms.txt files from websites and crawl all pages listed within them. The llms.txt format is a standard way for websites to provide LLM-friendly content listings.

Overview

The llms.txt file typically follows this format:

Site Name

Section Name

This skill parses these files and downloads all linked content.

Usage

Basic Usage

Run the crawl script with a target URL:

cd /path/to/skills/llms-txt-crawler/scripts npm install # First time only node crawl.js --url https://example.com

Command Line Options

Option Short Description Default

--url

-u

Base URL of the site with llms.txt Required

--output

-o

Output directory for crawled files ./output

--format

-f

Output format: md , json , or txt

md

--delay

-d

Delay between requests in milliseconds 500

--concurrent

-c

Maximum concurrent requests 3

Examples

Crawl agentskills.io documentation:

node crawl.js --url https://agentskills.io --output ./agentskills-docs

Crawl with custom rate limiting:

node crawl.js --url https://example.com --delay 1000 --concurrent 2

Output as JSON:

node crawl.js --url https://example.com --format json

Output Structure

The script creates the following output structure:

output/ ├── llms.txt # Original llms.txt file ├── index.json # Metadata about all crawled pages └── pages/ ├── page-1.md ├── page-2.md └── ...

Error Handling

  • Network errors: Retries up to 3 times with exponential backoff

  • Rate limiting: Respects delay settings between requests

  • Missing pages: Logs warnings but continues crawling other pages

  • Invalid URLs: Skips and logs invalid URLs

Integration Tips

When using this skill in an agent workflow:

  • First run the crawler to download content

  • The index.json file contains metadata about all pages

  • Use the downloaded markdown files for context or analysis

See Also

  • llms.txt Specification

  • scripts/crawl.js - The main crawler script

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

polaris-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

theme-factory

No summary provided by upstream source.

Repository SourceNeeds Review
General

xlsx

No summary provided by upstream source.

Repository SourceNeeds Review
General

webapp-testing

No summary provided by upstream source.

Repository SourceNeeds Review