crawler

Explore websites directly with Playwriter to design robust crawling flows. Analyze APIs, cookies, tokens, and headers, then document findings and generate crawler code.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawler" with this command: npx skills add alpoxdev/hypercore/alpoxdev-hypercore-crawler

Crawler Skill

Playwriter exploration -> API/Network analysis -> Documentation -> Code generation

Templates: document-templates.md · code-templates.md Checklists: pre-crawl-checklist.md · anti-bot-checklist.md References: playwriter-commands.md · crawling-patterns.md · selector-strategies.md · network-crawling.md


<trigger_conditions>

TriggerAction
Crawling, scraping, crawl, scrapeRun immediately
Website data extractionRun immediately
API reverse engineeringStart API interception
Anti-bot bypass requestCheck Anti-Detect guidance

</trigger_conditions>


<mandatory_reasoning>

Mandatory Sequential Thinking

  • Always use the sequential-thinking tool before starting crawl design, extraction strategy, or code generation decisions.
  • Run sequential-thinking for each major phase: discovery, method selection, and implementation planning.
  • If sequential-thinking is unavailable, stop and report the blocker instead of continuing without structured reasoning.

</mandatory_reasoning>


<workflow>
PhaseTaskCommand/Method
1. SessionCreate session + open pageplaywriter session new
2. ExploreUnderstand structureaccessibilitySnapshot, screenshotWithAccessibilityLabels
3. AnalyzeIntercept API, extract selectorspage.on('response'), getLocatorStringForElement
4. DocumentSave findings under .hypercore/crawler/[site]/Write
5. CodeGenerate crawler implementationcode-templates.md
</workflow>

<quick_commands>

# Create session + open page
playwriter session new
playwriter -s 1 -e "state.page = await context.newPage(); await state.page.goto('https://target.com')"

# Understand structure
playwriter -s 1 -e "console.log(await accessibilitySnapshot({ page: state.page }))"

# Intercept API responses
playwriter -s 1 -e $'
state.responses = [];
state.page.on("response", async res => {
  if (res.url().includes("/api/")) {
    try { state.responses.push({ url: res.url(), body: await res.json() }); } catch {}
  }
});
'

# Extract auth material
playwriter -s 1 -e "console.log(JSON.stringify(await context.cookies(), null, 2))"
playwriter -s 1 -e "console.log(await state.page.evaluate(() => localStorage.getItem('token')))"

# Convert selector
playwriter -s 1 -e "console.log(await getLocatorStringForElement(state.page.locator('aria-ref=e14')))"

</quick_commands>


<method_selection>

ConditionMethodNotes
API found + simple authfetchFastest
API + cookie/token requiredfetch + CookieRequires expiry handling
Strong bot detectionNstbrowserAnti-Detect
No API (SSR)Playwright DOMParse directly

</method_selection>


<output_structure>

.hypercore/crawler/[site-name]/
├── ANALYSIS.md      # Site structure
├── SELECTORS.md     # DOM selectors
├── API.md           # API endpoints
├── NETWORK.md       # Auth/network details
└── CRAWLER.ts       # Generated crawler code

Templates: document-templates.md

</output_structure>


<validation>
✅ Playwriter session created
✅ Structure analyzed with accessibilitySnapshot
✅ API interception attempted
✅ Selector extraction validated
✅ Findings documented under .hypercore/crawler/
✅ Crawler code generated
✅ sequential-thinking trace recorded for major phases
</validation>
<forbidden>
CategoryForbidden
AnalysisGuess selectors without structure analysis
ApproachUse DOM-only flow without checking APIs
DocumentationSkip documenting analysis results
NetworkIgnore rate limiting
</forbidden>
<example>
# User: /crawler crawl products from https://shop.example.com

# 1. Session
playwriter session new  # => 1
playwriter -s 1 -e "state.page = await context.newPage(); await state.page.goto('https://shop.example.com/products')"

# 2. Structure analysis
playwriter -s 1 -e "console.log(await accessibilitySnapshot({ page: state.page }))"
# => list "Products" [ref=e5]: listitem [ref=e6]: link "Product A" [ref=e7]

# 3. API detection (scroll trigger)
playwriter -s 1 -e "await state.page.evaluate(() => window.scrollTo(0, 9999))"
playwriter -s 1 -e "console.log(state.responses.map(r => r.url))"
# => ["/api/products?page=2"]

# 4. Documentation -> .hypercore/crawler/shop-example-com/
# 5. Generate API-based crawler
</example>

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

elon-musk

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

version-update

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

pre-deploy

No summary provided by upstream source.

Repository SourceNeeds Review