survey-seed-harvest

Bootstrap taxonomy seeds from existing survey/review papers inside your retrieved set.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "survey-seed-harvest" with this command: npx skills add willoscar/research-units-pipeline-skills/willoscar-research-units-pipeline-skills-survey-seed-harvest

Survey Seed Harvest

Bootstrap taxonomy seeds from existing survey/review papers inside your retrieved set.

This is an accelerator for the early structure stage: it should make taxonomy-builder easier, not replace it.

Inputs

  • papers/papers_dedup.jsonl (deduped paper metadata with titles/abstracts)

Outputs

  • outline/taxonomy.yml (seed taxonomy; expected to be refined)

Workflow (heuristic)

Uses: papers/papers_dedup.jsonl .

  • Find likely survey/review papers:

  • title/abstract contains “survey”, “review”, “systematic”, “meta-analysis”

  • Extract candidate topic terms and group them into:

  • ~4–10 top-level nodes (“chapters”)

  • 2–6 children per node (mappable leaves)

  • Write short, actionable descriptions:

  • what belongs here / what does not

  • (optional) list 2–5 representative titles as seeds

  • Treat the result as a starting point:

  • pass it to taxonomy-builder for domain-meaningful rewriting and scope alignment.

Quality checklist

  • outline/taxonomy.yml exists and is valid YAML.

  • Taxonomy has at least 2 levels (children used) and every node has a description.

  • Avoid generic placeholder nodes like “Overview/Benchmarks/Open Problems” unless they are truly content-based for your domain.

Script (optional helper)

Quick Start

  • python .codex/skills/survey-seed-harvest/scripts/run.py --help

  • python .codex/skills/survey-seed-harvest/scripts/run.py --workspace <workspace_dir>

All Options

  • --top-k <n> : number of candidate terms to consider

  • --min-freq <n> : minimum frequency threshold

Examples

  • More conservative term selection:

  • python .codex/skills/survey-seed-harvest/scripts/run.py --workspace <ws> --top-k 80 --min-freq 3

Notes

  • This helper is keyword-based; treat the output as seeds and refine with taxonomy-builder .

Troubleshooting

Issue: no survey/review papers are detected in the set

Fix:

  • Broaden retrieval (add “survey”, “review”, “benchmark” variants) or manually seed a few known surveys, then rerun.

Issue: taxonomy seeds look like generic buckets

Fix:

  • Keep seeds concrete (named methods/benchmarks/tasks) and rely on taxonomy-builder to rewrite under the actual scope.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

pdf-text-extractor

No summary provided by upstream source.

Repository SourceNeeds Review
Research

latex-compile-qa

No summary provided by upstream source.

Repository SourceNeeds Review
Research

draft-polisher

No summary provided by upstream source.

Repository SourceNeeds Review
Research

citation-verifier

No summary provided by upstream source.

Repository SourceNeeds Review