KB Abstract Fetch
Core Goal
-
Reuse the same PostgreSQL connection env variables as kb-meta-fetch .
-
Select rows whose abstract is empty and order by newest created_at first.
-
Open https://doi.org/<doi> in OpenClaw Browser and extract abstract text.
-
Write back only when the row is still empty at update time.
-
Default to dry run; require explicit --apply to write.
Required Environment
-
KB_DB_HOST
-
KB_DB_PORT
-
KB_DB_NAME
-
KB_DB_USER
-
KB_DB_PASSWORD
-
KB_LOG_DIR (required run log directory)
Workflow
- Run local self-test first (no DB/browser required):
python3 scripts/kb_abstract_fetch.py --self-test
- Dry run first (default mode; no DB write):
python3 scripts/kb_abstract_fetch.py --limit 100
- Apply updates after review:
python3 scripts/kb_abstract_fetch.py --limit 100 --apply
- Override table/column names when needed (created_at is fixed and required):
python3 scripts/kb_abstract_fetch.py
--table journals
--doi-column doi
--abstract-column abstract
--limit 100
--apply
Safety Contract
-
Selection filter:
-
DOI not empty
-
abstract empty (NULL or blank)
-
Selection order:
-
newest created_at first (ORDER BY created_at DESC NULLS LAST LIMIT n )
-
Update filter (second guard):
-
WHERE doi = ? AND abstract is still empty
-
Run summary:
-
emit RUN_SUMMARY_JSON=<json> for current run only.
-
Abort behavior:
-
stop early when errors exceed --max-errors .
Browser Requirement
-
openclaw CLI must be installed.
-
Script checks openclaw browser status ; if browser is not running, it tries openclaw browser start .
-
If start fails (for example extension tab not attached), attach OpenClaw browser session first, then rerun.
Script
- scripts/kb_abstract_fetch.py