data-base

Data acquisition is converting unstructured web content into structured data. Choose tool based on page complexity: JS-heavy → chrome-devtools MCP, static → Python requests.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-base" with this command: npx skills add jinfanzheng/kode-sdk-csharp/jinfanzheng-kode-sdk-csharp-data-base

Mental Model

Data acquisition is converting unstructured web content into structured data. Choose tool based on page complexity: JS-heavy → chrome-devtools MCP, static → Python requests.

Tool Selection

Page Type Tool When to Use

Dynamic (JS-rendered, SPAs) chrome-devtools MCP React/Vue apps, infinite scroll, login gates

Static HTML Python requests Blogs, news sites, simple pages

Complex/reusable logic Python script Multi-step scraping, rate limiting, proxies

Anti-Patterns (NEVER)

  • Don't scrape without checking robots.txt

  • Don't overload servers (default: 1 req/sec)

  • Don't scrape personal data without consent

  • Don't use Chinese characters in output filenames (ASCII only)

  • Don't forget to identify bot with User-Agent

Output Format

  • JSON: Nested/hierarchical data

  • CSV: Tabular data

  • Filename: {source}_{timestamp}.{ext} (ASCII only, e.g., news_20250115.csv )

Workflow

  • Ask: What data? Which sites? How much?

  • Select tool based on page type

  • Extract and save structured data

  • Deliver file path to user or pass to data-analysis

Python Environment

Auto-initialize virtual environment if needed, then execute:

cd skills/data-base

if [ ! -f ".venv/bin/python" ]; then echo "Creating Python environment..." ./setup.sh fi

.venv/bin/python your_script.py

The setup script auto-installs: requests, beautifulsoup4, pandas, web scraping tools.

References (load on demand)

For detailed APIs and templates, load: references/REFERENCE.md , references/templates.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

data-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
General

weather

No summary provided by upstream source.

Repository SourceNeeds Review
General

hotel

No summary provided by upstream source.

Repository SourceNeeds Review
General

email

No summary provided by upstream source.

Repository SourceNeeds Review