web-content-extractor

从网页URL中提取标题、正文、图片链接等内容

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "web-content-extractor" with this command: npx skills add shuishouxinboda/jiayinclaw-12345

网页内容提取器

这是一个实用的网页内容提取技能,可以从任意网页中提取结构化信息。

功能特点

  • 自动提取网页标题和元数据
  • 提取正文内容并清理HTML标签
  • 提取所有图片链接
  • 提取所有外链
  • 支持指定提取元素
  • 输出格式化JSON结果

使用方法

基本用法

技能输入:https://example.com
技能输出:{"title": "...", "content": "...", "images": [...], "links": [...]}

高级用法

  • 指定提取特定元素
  • 设置内容长度限制
  • 自定义输出格式

技术规格

  • 编程语言:Python 3
  • 依赖库:requests, beautifulsoup4
  • 网络要求:需要互联网连接

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

You.com Web Search & Research CLI

Web search, research with citations, and content extraction for bash agents using curl and You.com's REST API. - MANDATORY TRIGGERS: You.com, youdotcom, YDC,...

Registry Source
2.6K3Profile unavailable
General

Article Summarizer

Summarize articles and social posts from URLs using full-content retrieval first, with browser fallback when needed.

Registry Source
3330Profile unavailable
Automation

AWI

AWI (Agentic Web Interface) — 联网读取+搜索,单二进制零配置。 三级自动降级:直连 → 智能适配 → 浏览器渲染。 不需要 API Key,不需要 Docker。

Registry SourceRecently Updated
3930Profile unavailable
Automation

Web Scraping & Data Extraction Engine

Complete web scraping methodology — legal compliance, architecture design, anti-detection, data pipelines, and production operations. Use when building scrap...

Registry SourceRecently Updated
1.4K0Profile unavailable