spider

Web scraping using Chrome + WebMCP. Primary method for all web crawling tasks.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "spider" with this command: npx skills add sweihub/spider

Spider — Web Scraping Tool

This is the default web scraping method, replacing older approaches like web_fetch.


Trigger Conditions

Use this skill when user says:

KeywordsAction
抓取 / crawl / scrape / fetchUse Chrome + WebMCP to scrape web pages
采集Same as above
获取...新闻Scrape news pages
从...网站Specify website to scrape
同花顺Scrape Tonghuashun (10jqka) data
东方财富Scrape East Money data
雪球Scrape Xueqiu data
百度Search or scrape Baidu content

Usage Examples

User InputExecution
"抓取光库科技的新闻"Open Tonghuashun in Chrome, extract news
"抓取宁德时代的股吧"Open East Money guba in Chrome
"从同花顺抓取xxx"Open Tonghuashun page in Chrome
"search xxx"Open Google search in Chrome
"查一下xxx"Search or scrape in Chrome

Operation Flow

1. Check Chrome Status

{ action: "status" }

If not running, start it:

{ action: "start" }

2. Open Target Page

{ action: "open", targetUrl: "https://stockpage.10jqka.com.cn/300620/news/", target: "host" }

3. Get Page Snapshot

{ action: "snapshot", targetId: "xxx", maxChars: 20000 }

4. Page Interaction (click, type, etc.)

{ action: "act", targetId: "xxx", request: {"kind": "click", ref: "e33"} }

5. Cleanup: Return to about:blank

{ action: "navigate", targetId: "xxx", url: "about:blank" }

Common Website Templates

Tonghuashun Stock News

URL: https://stockpage.10jqka.com.cn/{stock_code}/news/
Example: https://stockpage.10jqka.com.cn/300620/news/

East Money Guba (Stock Forum)

URL: https://guba.eastmoney.com/list,{stock_code}.html
Example: https://guba.eastmoney.com/list,300620.html

Xueqiu (Snowball)

URL: https://xueqiu.com/S/SZ{stock_code}
Example: https://xueqiu.com/S/SZ300620

Baidu News Search

URL: https://www.baidu.com/s?wd={keyword}&tn=news

Chrome Setup (One-time)

  1. Open Chrome Flags:
    • chrome://flags/#enable-experimental-web-platform-features → Enabled
    • chrome://flags/#enable-webmcp-testing → Enabled
  2. Fully quit Chrome (Cmd+Q) and restart

Important Rules

  1. Use target="host" instead of "sandbox"
  2. Must cleanup after each task:
    • If multiple tabs exist, keep only one, close others
    • The remaining tab must navigate to about:blank
    • If multiple about:blank tabs exist, keep only the latest one, close others
    • Use browser action: tabs to check current tab status
    • After cleanup, ensure only one about:blank tab remains
  3. Reuse existing tabs, avoid opening new tabs frequently
  4. Handle anti-scraping sites: Tonghuashun, East Money need complete JavaScript loading

Error Handling

ErrorSolution
Sandbox unavailableUse target="host"
Slow page loadWait for snapshot to return before操作
Content extraction failedUse snapshot's maxChars to get more content
Anti-scraping blockedTry other finance sites or wait and retry

Default Scraping Priority

  1. Spider (Chrome + WebMCP) ← Primary method

    • Suitable for: Finance websites, stock news, forums
    • Advantages: Full JavaScript rendering, interactive
  2. web_fetch ← Backup method

    • Suitable for: Simple static pages
    • Disadvantage: Cannot handle JavaScript-rendered pages

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Img2img

Generate images from text descriptions using DALL-E 3 while adhering to usage policies and avoiding realistic human faces.

Registry SourceRecently Updated
General

Habitat-GS-Navigator

Navigate and interact with photo-realistic 3DGS environments via the Habitat-GS Bridge. Use when: user asks to explore a 3D scene, perform embodied navigatio...

Registry SourceRecently Updated
General

Memory Palace

持久化记忆管理。Use when: 用户告诉你个人信息/偏好/习惯、需要记住项目状态/技术决策、完成任务后有可复用经验、用户说"记住""别忘了""下次注意"、需要回忆之前的对话内容。支持语义搜索和时间推理。

Registry SourceRecently Updated
General

Podcast Transcript Mining Authority Positioning

Extract guest appearances, speaking topics, and soundbites from podcast transcripts to build authority portfolios and generate podcast pitch templates. Use w...

Registry SourceRecently Updated