sustainability-rss-fetch

Ingest all sustainability journal RSS entries into a dedicated RSS SQLite database first, keyed by DOI, then mark relevance and prune non-relevant rows to DOI-only. Use when building a DOI-first ingestion pipeline with mandatory full ingestion before topic filtering.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sustainability-rss-fetch" with this command: npx skills add tiangong-ai/skills/tiangong-ai-skills-sustainability-rss-fetch

Sustainability RSS Fetch

Core Goal

  • Ingest all RSS/Atom items into SQLite before topic filtering.
  • Use doi as the primary key in entries.
  • Keep RSS metadata isolated in its own DB file.
  • After semantic screening, keep relevant rows and prune non-relevant rows to DOI-only.

Triggering Conditions

  • Receive a request to import sustainability feeds and persist all fetched records first.
  • Receive a request to do prompt-based topic screening after DB ingestion.
  • Receive a request to convert irrelevant rows into lightweight DOI-only records.
  • Need stable DOI-keyed storage for downstream API/fulltext/summarization.

Mandatory Workflow

  1. Prepare runtime and RSS metadata DB path.
python3 -m pip install feedparser
export SUSTAIN_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/sustainability_rss.db"
python3 scripts/rss_subscribe.py init-db --db "$SUSTAIN_RSS_DB_PATH"
  1. Collect RSS window and ingest all fetched items first.
python3 scripts/rss_subscribe.py collect-window \
  --db "$SUSTAIN_RSS_DB_PATH" \
  --opml assets/journal.opml \
  --start 2026-02-01 \
  --end 2026-02-10 \
  --max-items-per-feed 150 \
  --topic-prompt "筛选与可持续主题相关的文章:生命周期评价、物质流分析、绿色供应链、绿电、绿色设计、减污降碳" \
  --output /tmp/sustainability-candidates.json \
  --pretty
  1. Screen candidates in agent context (semantic, not regex-only).
  • Use topic_prompt + user instructions.
  • Produce selected candidate_id list.
  1. Mark selected rows as relevant and prune unselected rows.
python3 scripts/rss_subscribe.py insert-selected \
  --db "$SUSTAIN_RSS_DB_PATH" \
  --candidates /tmp/sustainability-candidates.json \
  --selected-ids 3,7,12,21

Result:

  • selected candidates: is_relevant=1, keep metadata.
  • unselected candidates: clear metadata fields, keep DOI-only row (is_relevant=0).

Optional Maintenance Sync

python3 scripts/rss_subscribe.py sync --db "$SUSTAIN_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100

Source Management

python3 scripts/rss_subscribe.py add-feed --db "$SUSTAIN_RSS_DB_PATH" --url "https://example.com/feed.xml"
python3 scripts/rss_subscribe.py import-opml --db "$SUSTAIN_RSS_DB_PATH" --opml assets/journal.opml

Query Data

python3 scripts/rss_subscribe.py list-feeds --db "$SUSTAIN_RSS_DB_PATH" --limit 50
python3 scripts/rss_subscribe.py list-entries --db "$SUSTAIN_RSS_DB_PATH" --limit 100

Data Contract

  • feeds table: subscription and fetch state.
  • entries table (doi PK):
    • metadata fields (title/url/summary/categories/...)
    • doi_is_surrogate (when no DOI is present in source)
    • is_relevant (1 relevant, 0 pruned non-relevant, NULL not labeled yet)
  • Non-relevant rows are pruned to DOI-only payload for storage efficiency.

Configurable Parameters

  • --db
  • SUSTAIN_RSS_DB_PATH
  • --opml
  • --feed-url
  • --use-subscribed-feeds
  • --topic-prompt
  • --start/--end
  • --max-feeds
  • --max-items-per-feed
  • --user-agent
  • --cleanup-ttl-days

Error and Boundary Handling

  • Feed/network failure: continue other feeds and keep errors in feed state.
  • Missing feedparser: return install guidance.
  • Missing DOI in RSS item: create deterministic surrogate DOI key to keep full-ingestion guarantee.
  • Invalid selected IDs: fail fast before label/prune write.

References

  • references/input-model.md
  • references/output-rules.md
  • references/time-range-rules.md

Assets

  • assets/journal.opml
  • assets/config.example.json

Scripts

  • scripts/rss_subscribe.py

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ai-tech-rss-fetch

No summary provided by upstream source.

Repository SourceNeeds Review
General

email-smtp-send

No summary provided by upstream source.

Repository SourceNeeds Review
General

email-imap-fetch

No summary provided by upstream source.

Repository SourceNeeds Review
General

sci-journals-hybrid-search

No summary provided by upstream source.

Repository SourceNeeds Review