llms.txt Support Skill

Purpose

Single responsibility: Detect, fetch, and utilize llms.txt files that provide LLM-optimized documentation, enabling 10x faster documentation ingestion. (BP-4)

Background

The llms.txt standard (https://llmstxt.org/) provides a convention for websites to expose LLM-friendly documentation. Instead of scraping entire sites, check for llms.txt first.

File hierarchy (check in order):

llms-full.txt
Complete documentation (largest)
llms.txt
Standard documentation
llms-small.txt
Condensed documentation (smallest)

Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

Base URL is accessible
Check all three llms.txt variants in order
Validate file content is actual documentation (not error page)
Confirm file size is reasonable for the documentation scope

DO NOT assume llms.txt exists. Always probe first.

Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

Multiple llms.txt variants found - which size to use?
llms.txt content appears partial or outdated
File returns but content seems like error page
Site has llms.txt but content doesn't match expected documentation

NEVER assume llms.txt quality without verification.

Context Scope (Archetype 3 Mitigation)

Context Type Included Excluded

RELEVANT Target base URL, llms.txt content Full site scraping

PERIPHERAL llms.txt spec reference Other sites' llms.txt

DISTRACTOR Previous scraping attempts Unrelated documentation

Workflow Steps

Step 1: Detect llms.txt (Grounding)

Check for llms.txt variants (in order of preference)

curl -I https://example.com/llms-full.txt curl -I https://example.com/llms.txt curl -I https://example.com/llms-small.txt

Check common alternate locations

curl -I https://example.com/.well-known/llms.txt curl -I https://docs.example.com/llms.txt

Step 2: Validate Content

Fetch and inspect first 100 lines

curl -s https://example.com/llms.txt | head -100

Check file size

curl -sI https://example.com/llms.txt | grep -i content-length

Verify it's not an error page

curl -s https://example.com/llms.txt | grep -i "not found|error|404" && echo "WARNING: May be error page"

Step 3: Choose Variant

Variant Size Use Case

llms-full.txt

Large (1MB+) Complete documentation, full API reference

llms.txt

Medium Standard use, balanced coverage

llms-small.txt

Small (<100KB) Quick reference, limited context windows

Decision tree:

If context window is limited → llms-small.txt
If need complete coverage → llms-full.txt
Default → llms.txt

Step 4: Fetch and Process

Download llms.txt

curl -o docs/llms.txt https://example.com/llms.txt

Convert to skill format (if using skill-seekers)

skill-seekers scrape --llms-txt docs/llms.txt --name myskill

Or process manually

llms.txt is already LLM-optimized markdown

cp docs/llms.txt output/myskill/references/complete.md

Step 5: Validate Output

Check content structure

head -50 output/myskill/references/complete.md

Verify sections

grep "^#" output/myskill/references/complete.md | head -20

Check for code examples

grep -c '```' output/myskill/references/complete.md

Recovery Protocol (Archetype 4 Mitigation)

On error:

PAUSE - Note which variant failed
DIAGNOSE - Check error type:
404 Not Found → Try next variant or alternate location
403 Forbidden → May need authentication or user-agent
Timeout → Retry with longer timeout
Invalid content → Fall back to traditional scraping
ADAPT - Try alternate approach
RETRY - Next variant (max 3 attempts per variant)
ESCALATE - Inform user llms.txt unavailable, suggest scraping

Checkpoint Support

State saved to: .aiwg/working/checkpoints/llms-txt-support/

checkpoints/llms-txt-support/ ├── detection_results.json # Which variants found ├── selected_variant.txt # Which was chosen └── content_hash.txt # For cache validation

llms.txt Format Reference

Standard llms.txt structure:

Project Name

Brief description of the project

Overview

[High-level explanation]

Installation

[Setup instructions]

Quick Start

[Getting started guide]

API Reference

[Detailed API documentation]

Examples

[Code examples]

FAQ

[Common questions]

Detection Results Output

{ "base_url": "https://example.com", "detected": { "llms-full.txt": { "found": true, "url": "https://example.com/llms-full.txt", "size": 1523456, "last_modified": "2025-01-15T10:30:00Z" }, "llms.txt": { "found": true, "url": "https://example.com/llms.txt", "size": 245678, "last_modified": "2025-01-15T10:30:00Z" }, "llms-small.txt": { "found": false } }, "recommended": "llms.txt", "reason": "Standard size, good for most use cases" }

Known Sites with llms.txt

Sites known to support llms.txt (verify before use):

Anthropic documentation
Many modern API documentation sites
Framework documentation following the standard

Always verify - this list may be outdated.

Troubleshooting

Issue Diagnosis Solution

No llms.txt found Site doesn't support Fall back to doc-scraper

Content seems wrong Error page or redirect Check actual content, verify URL

File too large llms-full.txt overwhelming Use llms.txt or llms-small.txt

Outdated content llms.txt not maintained Consider scraping + llms.txt merge

Integration with doc-scraper

If llms.txt is incomplete or outdated, combine approaches:

1. Fetch llms.txt as base

curl -o base.md https://example.com/llms.txt

2. Scrape for additional/updated content

skill-seekers scrape --config config.json --skip-covered-by base.md

3. Merge results

llms.txt provides structure, scraping fills gaps

References

llms.txt Standard: https://llmstxt.org/
Skill Seekers llms.txt Detection: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LLMS_TXT_SUPPORT.md
REF-001: Production-Grade Agentic Workflows (BP-4, BP-9)
REF-002: LLM Failure Modes (Archetype 1-4 mitigations)

llms-txt-support

Safety Notice

Copy this and send it to your AI assistant to learn