news-crawler

新闻自动爬取与总结工具。用于抓取指定网站或RSS源的新闻内容，并生成摘要报告。使用场景： 1. 用户要求"获取今日新闻"、"爬取某网站内容" 2. 用户需要"总结新闻"、"生成日报" 3. 用户指定具体URL要求抓取内容 4. 需要监控特定新闻源的最新动态

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "news-crawler" with this command: npx skills add oshin123456/news-crawler

News Crawler - 新闻爬虫

自动爬取新闻网站和RSS源，提取内容并生成摘要。

快速开始

1. 获取RSS新闻列表

查看可用的新闻源：

python3 scripts/rss_fetcher.py

获取指定RSS源的新闻：

python3 scripts/rss_fetcher.py <rss_url> [max_items]

示例：

python3 scripts/rss_fetcher.py https://www.solidot.org/index.rss 5

2. 爬取具体网页内容

python3 scripts/crawl.py <url> [max_length]

示例：

python3 scripts/crawl.py https://example.com/news/article.html 3000

工作流程

生成新闻日报

选择新闻源 - 从常用源中选择或用户提供RSS地址
获取新闻列表 - 使用 rss_fetcher.py 获取最新文章
爬取详细内容 - 对每篇文章使用 crawl.py 获取全文
生成摘要 - 使用 LLM 总结每篇文章的核心内容
整理报告 - 按类别或时间排序，生成结构化日报

支持的RSS源

常用中文科技新闻源：

Solidot: https://www.solidot.org/index.rss
TechWeb: https://www.techweb.com.cn/rss/all.xml
36氪: https://36kr.com/feed

国际源：

Hacker News: https://news.ycombinator.com/rss
TechCrunch: https://techcrunch.com/feed/

输出格式

rss_fetcher.py 输出：

{
  "items": [
    {
      "title": "文章标题",
      "link": "文章链接",
      "description": "简介",
      "published": "发布时间"
    }
  ],
  "count": 10
}

crawl.py 输出：

{
  "url": "原始链接",
  "title": "页面标题",
  "content": "正文内容",
  "length": 5000
}

注意事项

尊重robots.txt - 爬取前检查目标网站的爬虫协议
控制频率 - 避免频繁请求同一网站
内容长度 - 默认截取5000字符，可通过参数调整
编码问题 - 脚本已处理UTF-8编码，特殊网站可能需要额外处理

扩展开发

如需支持更多功能，可参考：

references/rss_sources.md - 更多RSS源列表
添加定时任务支持（结合 cron）
添加飞书/邮件推送功能

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

General

Zoom

Zoom API integration with managed OAuth. Manage meetings, webinars, recordings, and user profiles. Use this skill when users want to schedule meetings, manag...

Registry SourceRecently Updated

General

Kleinanzeigen.de Helper

Erstelle und verwalte Verkaufsanzeigen speziell auf kleinanzeigen.de. Verwende diesen Skill wenn der Human sagt, er will etwas auf kleinanzeigen.de verkaufen...

Registry SourceRecently Updated

General

Poku

Sends and receives phone calls and messages (like SMS, WhatsApp, Slack), and reserves dedicated phone numbers using the Poku API. Example use cases: calling...

Registry SourceRecently Updated

General

IMAP/SMTP Email - Maddy Fix

Read and send email via IMAP/SMTP. Check for new/unread messages, fetch content, search mailboxes, mark as read/unread, and send emails with attachments. Sup...

Registry SourceRecently Updated