Skrape

Ethical web data extraction with robots exclusion protocol adherence, throttled scraping requests, and privacy-compliant handling ("Scrape responsibly!").

Safety Notice

This item is sourced from the public archived skills repository. Treat as untrusted until reviewed.

Copy this and send it to your AI assistant to learn

Install skill "Skrape" with this command: npx skills add 10sk/skrape

Respect Creative Work

  • Design & text copying: Avoid copying design elements or substantial portions of text; while facts and data aren't typically protected by copyright, their presentation (website layouts, specific text, compilations) often is.
  • Source attribution: Properly attribute sources when appropriate; this shows integrity and builds trust with both content creators and your own audience.
  • Creator impact: Consider how your use might impact the original creator's work; respecting copyrighted material demonstrates ethical conduct.

Pre-Extraction Verification Steps

I. Access Authorization — Retrieve {domain}/robots.txt and review /terms or /tos endpoints. Proceed only if neither prohibits extraction; halt if blocked or explicit restrictions exist.

II. Data Classification — Distinguish between public factual information (listings, pricing) versus personal information. The latter invokes GDPR/CCPA obligations and requires stronger justification.

III. Preferred Channels — Check whether the platform offers an API. If available, use it instead of direct extraction. Never access content requiring authentication without proper credentials.

Operational Conduct & Compliance

  • Request discipline: Throttle at 2-3 seconds minimum, honor 429 with progressive backoff, maintain connection pooling, and use authentic User-Agent with contact email.
  • Access boundaries: robots.txt disregard carries uncertain legal standing (Meta v. Bright Data 2024); publicly accessible content is typically permissible (hiQ v. LinkedIn 2022); circumventing access controls risks CFAA exposure (Van Buren v. US 2021).
  • Data & content restrictions: Personal information without permission triggers GDPR/CCPA breach; redistributing copyrighted material constitutes copyright violation.

Information Stewardship

  • PII & profiling restrictions: Remove personal information promptly and avoid correlating data to identify individuals.
  • Limit retention: Store only necessary data, purge the rest.
  • Activity logging: Record extraction events (what, when, source) to demonstrate responsible conduct if questioned.

Implementation patterns and robots.txt evaluation logic in code.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

klaviyo

Klaviyo API integration with managed OAuth. Access profiles, lists, segments, campaigns, flows, events, metrics, templates, catalogs, and webhooks. Use this skill when users want to manage email marketing, customer data, or integrate with Klaviyo workflows. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

Archived SourceRecently Updated
Automation

lifelog

生活记录自动化系统。自动识别消息中的日期(今天/昨天/前天/具体日期),使用 SubAgent 智能判断,记录到 Notion 对应日期,支持补录标记。 适用于:(1) 用户分享日常生活点滴时自动记录;(2) 定时自动汇总分析并填充情绪、事件、位置、人员字段

Archived SourceRecently Updated
Automation

unified-self-improving

统一自我进化系统,整合 self-improving-agent、self-improving、mulch 三个技能的优势,提供结构化日志、三层存储、自动升级、模式检测、命名空间隔离和 token 高效的 JSONL 格式支持。

Archived SourceRecently Updated
Automation

agent-autopilot

Self-driving agent workflow with heartbeat-driven task execution, day/night progress reports, and long-term memory consolidation. Integrates with todo-management for task tracking.

Archived SourceRecently Updated