pmc-harvest

Fetch articles from PubMed Central using NCBI APIs. Search journals, retrieve full text via OAI-PMH, batch harvest for RAG pipelines. No API key required.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pmc-harvest" with this command: npx skills add angusthefuzz/pmc-harvest

PMC Harvest

Fetch full-text articles from PubMed Central using official NCBI APIs.

Features

  • E-utilities search — Find articles by journal, year, query
  • OAI-PMH full text — Retrieve complete article XML (open access only)
  • Batch harvesting — Process multiple journals at once
  • Abstract fetch — Lightweight retrieval for review queues
  • No API key required — Uses public NCBI APIs (rate-limited)

Usage

# Search a journal
node {baseDir}/scripts/pmc-harvest.js --search "J Stroke[journal]" --year 2025

# Fetch full text for a specific article
node {baseDir}/scripts/pmc-harvest.js --fetch PMC12345678

# Batch harvest from multiple journals
node {baseDir}/scripts/pmc-harvest.js --harvest journals.json --year 2025

# Test with known journals
node {baseDir}/scripts/pmc-harvest.js --test

Options

FlagDescription
--search <query>PMC search query (use journal[name] format)
--year <year>Filter by publication year
--max <n>Max results (default: 100)
--fetch <pmcid>Fetch full text for specific PMCID
--harvest <file>Batch harvest from JSON journal list
--testRun test with sample journals

Programmatic API

const pmc = require('{baseDir}/lib/api.js');

// Search
const { count, pmcids } = await pmc.searchJournal('"J Stroke"[journal]', { year: 2025 });

// Get summaries
const summaries = await pmc.getSummaries(pmcids);

// Fetch full text
const { available, xml, reason } = await pmc.fetchFullText('PMC12345678');

// Parse JATS XML
const { title, abstract, body } = pmc.parseJATS(xml);

// Fetch abstract only (lightweight)
const { title, abstract } = await pmc.fetchAbstract('PMC12345678');

Journal Query Examples

const queries = {
  'Stroke': '"Stroke"[journal]',
  'Journal of Stroke': '"J Stroke"[journal]',
  'Stroke & Vascular Neurology': '"Stroke Vasc Neurol"[journal]',
  'European Stroke Journal': '"Eur Stroke J"[journal]',
  'BMC Neurology': '"BMC Neurol"[journal]'
};

Limitations

  • OAI-PMH only returns open-access articles — restricted content unavailable
  • Rate limits — ~3 requests/second without API key
  • Peak hours — NCBI recommends avoiding 5AM-9PM ET for large batches

API Reference

This skill wraps NCBI's official APIs:

  • E-utilities: https://eutils.ncbi.nlm.nih.gov/entrez/eutils
    • esearch.fcgi — Search PMC
    • esummary.fcgi — Get article metadata
  • OAI-PMH: https://pmc.ncbi.nlm.nih.gov/api/oai/v1/mh
    • GetRecord — Fetch full text XML

Full docs: https://www.ncbi.nlm.nih.gov/books/NBK25501/

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Session-Memory Enhanced

Session-Memory Enhanced v4.0 - 统一增强版。融合 session-memory + memu-engine 核心功能。特性:结构化提取 + 向量检索 + 不可变分片 + 三位一体自动化 + 多代理隔离 + AI 摘要 + 零配置启动。

Registry SourceRecently Updated
General

PRISM-GEN-DEMO

English: Retrieve, filter, sort, merge, and visualize multiple CSV result files from PRISM-Gen molecular generation/screening. Provides portable query-based...

Registry SourceRecently Updated
General

Video Pro by cza999

专业AI视频生成器,支持文本转高质量短视频,批量处理、多模板和高级自定义语音功能,适合创作者和企业。

Registry SourceRecently Updated
0133
cza999