Sustainability RSS Fetch

Core Goal

Ingest all RSS/Atom items into SQLite before topic filtering.
Use doi as the primary key in entries.
Keep RSS metadata isolated in its own DB file.
After semantic screening, keep relevant rows and prune non-relevant rows to DOI-only.

Triggering Conditions

Receive a request to import sustainability feeds and persist all fetched records first.
Receive a request to do prompt-based topic screening after DB ingestion.
Receive a request to convert irrelevant rows into lightweight DOI-only records.
Need stable DOI-keyed storage for downstream API/fulltext/summarization.

Mandatory Workflow

Prepare runtime and RSS metadata DB path.

python3 -m pip install feedparser
export SUSTAIN_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/sustainability_rss.db"
python3 scripts/rss_subscribe.py init-db --db "$SUSTAIN_RSS_DB_PATH"

Collect RSS window and ingest all fetched items first.

python3 scripts/rss_subscribe.py collect-window \
  --db "$SUSTAIN_RSS_DB_PATH" \
  --opml assets/journal.opml \
  --start 2026-02-01 \
  --end 2026-02-10 \
  --max-items-per-feed 150 \
  --topic-prompt "筛选与可持续主题相关的文章：生命周期评价、物质流分析、绿色供应链、绿电、绿色设计、减污降碳" \
  --output /tmp/sustainability-candidates.json \
  --pretty

Screen candidates in agent context (semantic, not regex-only).

Use topic_prompt + user instructions.
Produce selected candidate_id list.

Mark selected rows as relevant and prune unselected rows.

python3 scripts/rss_subscribe.py insert-selected \
  --db "$SUSTAIN_RSS_DB_PATH" \
  --candidates /tmp/sustainability-candidates.json \
  --selected-ids 3,7,12,21

Result:

selected candidates: is_relevant=1, keep metadata.
unselected candidates: clear metadata fields, keep DOI-only row (is_relevant=0).

Optional Maintenance Sync

python3 scripts/rss_subscribe.py sync --db "$SUSTAIN_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100

Source Management

python3 scripts/rss_subscribe.py add-feed --db "$SUSTAIN_RSS_DB_PATH" --url "https://example.com/feed.xml"
python3 scripts/rss_subscribe.py import-opml --db "$SUSTAIN_RSS_DB_PATH" --opml assets/journal.opml

Query Data

python3 scripts/rss_subscribe.py list-feeds --db "$SUSTAIN_RSS_DB_PATH" --limit 50
python3 scripts/rss_subscribe.py list-entries --db "$SUSTAIN_RSS_DB_PATH" --limit 100

Data Contract

feeds table: subscription and fetch state.
entries table (doi PK):
- metadata fields (title/url/summary/categories/...)
- doi_is_surrogate (when no DOI is present in source)
- is_relevant (1 relevant, 0 pruned non-relevant, NULL not labeled yet)
Non-relevant rows are pruned to DOI-only payload for storage efficiency.

Configurable Parameters

--db
SUSTAIN_RSS_DB_PATH
--opml
--feed-url
--use-subscribed-feeds
--topic-prompt
--start/--end
--max-feeds
--max-items-per-feed
--user-agent
--cleanup-ttl-days

Error and Boundary Handling

Feed/network failure: continue other feeds and keep errors in feed state.
Missing feedparser: return install guidance.
Missing DOI in RSS item: create deterministic surrogate DOI key to keep full-ingestion guarantee.
Invalid selected IDs: fail fast before label/prune write.

References

references/input-model.md
references/output-rules.md
references/time-range-rules.md

Assets

assets/journal.opml
assets/config.example.json

Scripts

scripts/rss_subscribe.py

sustainability-rss-fetch

Safety Notice

Copy this and send it to your AI assistant to learn

Sustainability RSS Fetch

Core Goal

Triggering Conditions

Mandatory Workflow

Optional Maintenance Sync

Source Management

Query Data

Data Contract

Configurable Parameters

Error and Boundary Handling

References

Assets

Scripts

Source Transparency

Related Skills

ai-tech-rss-fetch

email-smtp-send

email-imap-fetch

sci-journals-hybrid-search