web-fetcher

Smart web content fetcher - articles and videos from WeChat, Feishu, Bilibili, Zhihu, Toutiao, YouTube, etc. Triggers: '抓取文章', '下载网页', '保存文章', 'fetch URL', '下载视频', '抓取飞书文档', '抓取微信文章', '把这个链接内容保存下来', '下载B站视频', 'download video', 'scrape article'.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "web-fetcher" with this command: npx skills add alexxxiong/web-fetcher

Web Fetcher

Smart web content fetcher for Claude Code. Automatically detects platform and uses the best strategy to fetch articles or download videos.

Quick Start

# Fetch an article
python3 {SKILL_DIR}/fetcher.py "URL" -o ~/docs/

# Download a video
python3 {SKILL_DIR}/fetcher.py "https://b23.tv/xxx" -o ~/videos/

# Batch fetch from file
python3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/

Install Dependencies

Install only what you need — dependencies are checked at runtime:

DependencyPurposeInstall
scraplingArticle fetching (HTTP + browser)pip install scrapling
yt-dlpVideo downloadpip install yt-dlp
camoufoxAnti-detection browser (Xiaohongshu, Weibo)pip install camoufox && python3 -m camoufox fetch
html2textHTML to Markdown conversionpip install html2text

Smart Routing

The fetcher automatically detects the platform from the URL:

PlatformMethodNotes
mp.weixin.qq.comscraplingExtracts data-src images, handles SVG placeholders
*.feishu.cnVirtual scrollCollects all blocks via scrolling, downloads images with cookies
zhuanlan.zhihu.comscrapling.Post-RichText selector
www.zhihu.comscrapling.RichContent selector
www.toutiao.comscraplingHandles toutiaoimg.com base64 placeholders
www.xiaohongshu.comcamoufoxAnti-bot protection requires stealth browser
www.weibo.comcamoufoxAnti-bot protection requires stealth browser
bilibili.com / b23.tvyt-dlpVideo download, supports quality selection
youtube.com / youtu.beyt-dlpVideo download
douyin.comyt-dlpVideo download
Unknown URLsscraplingGeneric fetch with fallback tiers

CLI Reference

python3 {SKILL_DIR}/fetcher.py [URL] [OPTIONS]

Arguments:
  url                    URL to fetch

Options:
  -o, --output DIR       Output directory (default: current)
  -q, --quality N        Video quality, e.g. 1080, 720 (default: 1080)
  --method METHOD        Force method: scrapling, camoufox, ytdlp, feishu
  --selector CSS         Force CSS selector for content extraction
  --urls-file FILE       File with URLs (one per line, # for comments)
  --audio-only           Extract audio only (video downloads)
  --no-images            Skip image download (articles)
  --cookies-browser NAME Browser for cookies (e.g., chrome, firefox)

Platform Notes

WeChat (mp.weixin.qq.com)

  • Images use data-src attribute with mmbiz.qpic.cn URLs
  • Visible <img> tags contain SVG placeholders (lazy loading)
  • Image download requires Referer: https://mp.weixin.qq.com/ header
  • Scrapling GET usually works; no browser needed

Feishu (*.feishu.cn)

  • Uses virtual scroll — content blocks are rendered on-demand
  • The fetcher scrolls through the entire document, collecting [data-block-id] elements
  • Images require authenticated fetch (cookies), downloaded via browser's fetch API
  • May show "Unable to print" artifacts which are auto-cleaned

Bilibili

  • Short links (b23.tv) are auto-resolved
  • For premium/member content, use --cookies-browser chrome
  • Default quality is 1080p, adjustable with -q

Troubleshooting

ProblemSolution
scrapling not foundpip install scrapling
yt-dlp not foundpip install yt-dlp
Article content too shortTry --method camoufox for JS-heavy pages
Feishu returns login pageThe doc may require authentication
Bilibili 403Use --cookies-browser chrome
Image download failsCheck network; WeChat images need Referer header (auto-handled)

Manual Usage

When the CLI doesn't fit your needs, use the modules directly:

from lib.router import route, check_dependency
from lib.article import fetch_article
from lib.video import fetch_video
from lib.feishu import fetch_feishu

# Route a URL
r = route("https://mp.weixin.qq.com/s/xxx")
# {'type': 'article', 'method': 'scrapling', 'selector': '#js_content', 'post': 'wx_images'}

# Fetch article
fetch_article(url, output_dir="/tmp/out", route_config=r)

# Download video
fetch_video(url, output_dir="/tmp/out", quality="720")

# Fetch Feishu doc
fetch_feishu(url, output_dir="/tmp/out")

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

AIWolfPK - AI狼人杀

四个AI互相猜疑,你坐着看戏。每局30秒,到底谁是狼? Four AIs play Werewolf while you watch. 30s per round. Spot the wolf before they do.

Registry SourceRecently Updated
General

Project Analyzer

Analyze any project directory and produce a detailed report covering what the project does, its tech stack, folder structure, entry points, how to run it, an...

Registry SourceRecently Updated
General

Thought-Retriever

提炼对话回答中的核心洞察为高置信度知识晶体,存储于本体驱动记忆系统的自我进化与复用。

Registry SourceRecently Updated
General

Miaoji Bid Guard Pro

亚马逊广告护城河Pro版,90天ROI预测+多活动协同+季节性出价+关键词攻防矩阵。 从单次调价建议升级为完整的广告战役规划。基础功能可使用 miaoji-bid-guard 免费版。

Registry SourceRecently Updated