datapulse

Cross-platform content collection, web search, trending topics, confidence scoring, and watch/triage workflows for assistant and agent usage.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "datapulse" with this command: npx skills add sunyifei83/datapulse

DataPulse Skill (v0.8.1)

Use this skill when the user needs one or more of the following:

  • Read or batch-read URLs across X, Reddit, YouTube, Bilibili, Telegram, WeChat, Xiaohongshu, RSS, arXiv, Hacker News, GitHub, and generic web pages
  • Search the web, inspect trending topics, or collect cross-platform signals
  • Create watch missions, alert routes, triage queues, or story evidence packs
  • Run assistant-ready URL intake through datapulse_skill.run()

Python Entry Point

from datapulse_skill import run

run("请处理这些链接: https://x.com/... https://www.reddit.com/...")

Core Capabilities

  • URL ingestion with normalized DataPulseItem output
  • Confidence scoring and ranking
  • Web search and trending discovery
  • Watch missions and alert routing
  • Triage queue and story workspace workflows

Behavior Disclosure

Browser Automation (optional)

DataPulse uses Playwright for platforms that require authenticated browser sessions (WeChat, Xiaohongshu). Browser automation is opt-in only — it activates when the user explicitly runs a login command and a valid session file exists. The playwright dependency is optional (pip install datapulse[browser]). No browser launches occur during normal URL reading.

Subprocess Calls

  • MCP transport: Story and triage modules invoke subprocess.run() to communicate with MCP tool servers via subprocess_json transport (stdin/stdout JSON-RPC). All calls have explicit timeouts (30s default).
  • YouTube fallback: The YouTube collector may call yt-dlp as a subprocess for audio transcript extraction when the native API is unavailable.
  • CLI update check: The CLI invokes pip install --upgrade only when the user explicitly runs --upgrade.

No subprocess call runs silently or without user-initiated action.

Local Persistence

  • Session files: Playwright login sessions are saved to ~/.datapulse/sessions/ for reuse. Sessions are TTL-cached (12h) and can be invalidated via invalidate_session_cache().
  • Data files: Watch missions, alert routes, triage queues, story workspaces, and entity stores persist as JSON files under the working directory (data/ folder). All writes use atomic save patterns.

No data is written outside the working directory or ~/.datapulse/ without explicit user action.

Outbound HTTP (alert delivery)

When the user configures alert routes, DataPulse sends POST requests to user-specified endpoints:

  • Webhook: arbitrary URL provided by the user
  • Feishu: Feishu bot webhook URL provided by the user
  • Telegram: Telegram Bot API (api.telegram.org) using a user-provided bot token

Alert delivery only fires when: (1) a watch mission matches new content, AND (2) the user has explicitly configured a route with a destination URL or token. No outbound POST occurs without user-configured routes.

Local Server (optional)

datapulse-console starts a local FastAPI/Uvicorn HTTP server for the browser-based console UI. It binds to localhost by default and is never started automatically — only when the user explicitly runs datapulse-console or python -m datapulse.console_server.

External API Calls (read-only)

Normal operation makes outbound GET/POST requests to:

  • Jina AI (r.jina.ai, s.jina.ai): URL reading and web search (requires JINA_API_KEY)
  • Tavily (api.tavily.com): web search (requires TAVILY_API_KEY)
  • Groq (api.groq.com): YouTube audio transcription fallback (requires GROQ_API_KEY)
  • Target URLs: the URLs the user asks to read

All API keys are read from environment variables; none are bundled or hard-coded.

Environment Notes

  • Python 3.10+
  • Optional search enhancement: JINA_API_KEY, TAVILY_API_KEY
  • Optional platform enhancement: TG_API_ID, TG_API_HASH, GROQ_API_KEY
  • Optional browser sessions: pip install datapulse[browser] (Playwright)
  • Optional console UI: pip install datapulse[console] (FastAPI + Uvicorn)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

bQuery.js - The jQuery for the modern Web Platform.

Use this skill when working with @bquery/bquery, bQuery apps, or the bQuery ecosystem. It helps the agent choose the right bQuery module, write idiomatic cod...

Registry SourceRecently Updated
Automation

Context Memory Recovery

Use when a user asks an OpenClaw, Hermes, or similar file-backed agent to preserve, recover, checkpoint, or restore working context across new sessions, mode...

Registry SourceRecently Updated
Automation

Skill 编排核心

Skill 编排核心 - 上下文管理、流程编排、质量保证

Registry SourceRecently Updated
Automation

Clawmoku Gomoku

Clawmoku 五子棋 — 在虾聊(ClawdChat · clawdchat.cn)与其他 AI Agent 对弈五子棋。当用户提到下五子棋、Clawmoku、找人下棋、五子棋对战、gomoku 时触发。

Registry SourceRecently Updated