wechat-article-extractor

WeChat Article Extractor

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "wechat-article-extractor" with this command: npx skills add liaosvcaf/openclaw-skill-wechat-article-extractor/liaosvcaf-openclaw-skill-wechat-article-extractor-wechat-article-extractor

WeChat Article Extractor

Extract WeChat public account articles to clean Markdown. WeChat blocks headless browsers (环境异常 CAPTCHA) and web_fetch gets empty JS-rendered pages, so the reliable approach is: find a mirror on aggregator sites, then extract content.

Scope & Boundaries

This skill handles:

  • Extracting article text, images, and metadata from WeChat article URLs

  • Finding mirror copies when direct access is blocked

  • Converting HTML to clean Markdown

  • Saving output as .md files

This skill does NOT handle:

  • Publishing or syncing to note-taking apps (that's the user's workflow)

  • Batch extraction of multiple articles (handle one at a time)

  • WeChat login, authentication, or account management

  • Translating article content

Inputs

Input Required Description

WeChat URL Yes An mp.weixin.qq.com link

Output filename No Defaults to kebab-case of article title

Save location No Defaults to /tmp/

Outputs

  • A Markdown file with full article content, images, and metadata header

  • Console confirmation with file path and character count

Workflow

Step 1 — Try direct fetch (fast path)

web_fetch(url, extractMode="markdown", maxChars=50000)

Success check: If result rawLength > 500 AND content has real paragraphs (not just nav/footer text) → skip to Step 4 Option B.

Failure indicators: rawLength < 500 , content is navigation/boilerplate only, or contains "环境异常" → go to Step 2.

Step 2 — Extract article metadata

From the URL or any partial content, identify:

  • Article title (from <title> or og:title)

  • Author / account name (from og:description or page content)

If metadata is unavailable from the URL, ask the user for the article title.

Step 3 — Search for mirrors

web_search("<article title> <author/account name>")

Mirror site priority (ranked by content quality and reliability):

  • 53ai.com — full content, reliable formatting

  • mp.ofweek.com — tech articles

  • juejin.cn — developer content

  • woshipm.com — product/business content

  • 36kr.com — tech/business news

If title is unknown, try: web_search("site:53ai.com <keywords from URL path>")

If no mirrors found: Try the Chrome Extension Relay fallback (see Fallback section).

Step 4 — Download and extract

Option A — Mirror found:

curl -s -L "<mirror_url>" -o /tmp/wechat-article.html

Verify file size > 10KB (smaller usually means redirect/error page).

Run the extraction script:

python3 <skill_dir>/scripts/extract_wechat.py /tmp/wechat-article.html /tmp/<output-filename>.md

Replace <skill_dir> with the directory containing this SKILL.md.

Option B — Direct fetch succeeded (Step 1): Format the fetched markdown with the header template below.

Step 5 — Verify output quality

Check the output file:

  • Has a title (not "WeChat Article")

  • Has multiple paragraphs of real content

  • Images have valid URLs (not broken/placeholder)

  • No excessive HTML artifacts remaining

If output looks truncated or garbled, try a different mirror site (return to Step 3).

Step 6 — Deliver to user

Report:

  • File saved at: <path>

  • Title: <title>

  • Size: <char count> characters

  • Image count: <N> images

If the user wants it saved to a specific location (e.g., Obsidian), follow their instructions for the final copy.

Markdown Header Template

Every extracted article must include this header:

<title>

作者: <author> 来源: 微信公众号「<account_name>」 日期: <date> 原文: <original_wechat_url>


摘要: <1-2 sentence summary generated from content>


Fields that cannot be determined should be omitted (don't write "Unknown").

Fallback: Chrome Extension Relay

If no mirror exists (very new or niche article):

Tell the user (in Chinese if they wrote in Chinese):

"没有找到镜像。请在 Chrome 中打开这篇文章,然后点击 OpenClaw Browser Relay 扩展图标(badge 亮起),我就能直接读取内容。"

Then use:

browser(action="snapshot", profile="chrome")

Extract content from the snapshot and format with the header template.

Error Handling

Problem Detection Action

WeChat blocks access rawLength < 500 or "环境异常" Search for mirrors (Step 3)

No mirrors found Search returns 0 relevant results Try Chrome Relay fallback

Mirror content truncated Output < 1000 chars when original is long Try next mirror site

Script extraction fails Python error or empty output Fall back to web_fetch on mirror URL

Images broken Image URLs return 404 Note in output; images may expire

Success Criteria

  • Output Markdown contains the full article text (not truncated)

  • Title and metadata are correctly extracted

  • Images are preserved with working URLs

  • No HTML artifacts or navigation junk in output

  • File is saved at the specified location

Notes

  • WeChat image URLs from mirrors (e.g., api.ibos.cn proxy) are generally valid and render in most Markdown viewers

  • Mirror sites typically publish within minutes of the original

  • The · · · section dividers are WeChat style — preserve them

  • For very long articles (>50K chars), the script handles them fine but web_fetch may truncate

Configuration

No persistent configuration required. The skill uses standard OpenClaw tools (web_fetch , web_search , exec ) and optionally browser for the Chrome Relay fallback.

Required tools:

Tool Purpose

web_fetch

Direct article fetch attempt

web_search

Mirror site discovery

exec

Run curl and Python extraction script

Optional tools:

Tool Purpose

browser

Chrome Extension Relay fallback

System dependencies:

Dependency Purpose

Python 3.8+ Extraction script

curl Mirror page download

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

wechat-article-extractor

No summary provided by upstream source.

Repository SourceNeeds Review
General

wechat-article-extractor

No summary provided by upstream source.

Repository SourceNeeds Review
General

image-gen

Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".

Archived SourceRecently Updated
General

explainer

Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX(视频形式)".

Archived SourceRecently Updated