citation-diversifier

Citation Diversifier (budget-as-constraints) [NO NEW FACTS]

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "citation-diversifier" with this command: npx skills add willoscar/research-units-pipeline-skills/willoscar-research-units-pipeline-skills-citation-diversifier

Citation Diversifier (budget-as-constraints) [NO NEW FACTS]

Purpose: fix a common survey failure mode:

  • the draft reads under-cited (or reuses the same few citations everywhere)

  • the pipeline fails the global unique-citation gate

This skill does not change prose by itself. It produces a constraint sheet: output/CITATION_BUDGET_REPORT.md .

Inputs

  • output/DRAFT.md

  • outline/outline.yml (H3 ids/titles; used to allocate budgets per subsection)

  • outline/writer_context_packs.jsonl (source of allowed_bibkeys_{selected,mapped,chapter,global} per H3)

  • citations/ref.bib

Output

  • output/CITATION_BUDGET_REPORT.md

Non-negotiables (NO NEW FACTS)

  • Only propose citation keys that exist in citations/ref.bib .

  • Only propose keys that are in-scope for the target H3 (prefer subsection-first scope; use chapter/global only when truly cross-cutting).

  • Do not propose “padding citations” that would require adding new claims or new numbers.

What a good budget report looks like (contract)

The report should feel like a constraint sheet, not a random list:

  • It states the blocking policy target and the gap-to-target (how many unique keys are missing; policy default is recommended ).

  • For each H3, it proposes a scope-safe budget sized to actually close the gap:

  • small gaps: 3-6 keys / H3 is often enough

  • A150++ gaps: plan for ~6-12 keys / H3 (and avoid duplicates across H3 budgets)

  • It gives placement guidance (where in the subsection those keys can be embedded without adding new facts).

Canonical (parseable) lines required (downstream validators depend on these):

  • The target is derived from queries.md:citation_target (recommended by default for A150++).

    • Global target (policy; blocking): >= <N> ...
    • Gap: <K> (gap-to-target; if 0 , injection can be a no-op PASS)

Optional (always reported; may be blocking depending on citation_target ):

    • Global recommended target: >= <N> ...
    • Gap to recommended: <K>

Recommended prioritization (scope-safe):

  • allowed_bibkeys_selected → allowed_bibkeys_mapped → allowed_bibkeys_chapter

  • Use allowed_bibkeys_global only for:

  • benchmarks/protocol papers

  • widely-used datasets/suites

  • cross-cutting surveys/method papers referenced across chapters

How this connects to writing (LLM-first)

After you generate the budget report:

  • Apply it using citation-injector (LLM edits to output/DRAFT.md , NO NEW FACTS).

  • Then run draft-polisher to remove any “budget dump voice” while keeping citation keys unchanged.

Important: citation-injector is LLM-first. Its script is validation-only.

Workflow

  • Diagnose the global situation

  • Read output/DRAFT.md and estimate the “unique-key gap” (or use pipeline-auditor ’s FAIL reason).

  • Allocate budgets per H3 (scope-first)

  • Use outline/outline.yml to enumerate H3s in paper order.

  • For each H3, read its allowed key sets from outline/writer_context_packs.jsonl .

  • Pick a small set of unused keys that strengthen positioning without requiring new claims.

  • Write output/CITATION_BUDGET_REPORT.md

Required structure:

    • Status: PASS|FAIL
    • Global target (policy; blocking): >= <N> ...
    • Gap: <K>
  • Summary

(gap + strategy)

  • Per-subsection budgets

(H3 id/title → suggested keys → placement hint)

Script (optional; deterministic report generator)

If you want a deterministic first-pass budget report, run the helper script. Treat it as a baseline and refine the plan as needed.

Quick Start

  • python .codex/skills/citation-diversifier/scripts/run.py --help

  • python .codex/skills/citation-diversifier/scripts/run.py --workspace workspaces/<ws>

All Options

  • --workspace <dir>

  • --unit-id <U###> (optional)

  • --inputs <semicolon-separated> (rare override; prefer defaults)

  • --outputs <semicolon-separated> (rare override; default writes output/CITATION_BUDGET_REPORT.md )

  • --checkpoint <C#> (optional)

Examples

  • Default IO:

  • python .codex/skills/citation-diversifier/scripts/run.py --workspace workspaces/<ws>

Done criteria

  • output/CITATION_BUDGET_REPORT.md exists and has actionable, in-scope budgets.

  • After applying the plan via citation-injector , pipeline-auditor no longer FAILs on global unique citations.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

pdf-text-extractor

No summary provided by upstream source.

Repository SourceNeeds Review
Research

latex-compile-qa

No summary provided by upstream source.

Repository SourceNeeds Review
Research

draft-polisher

No summary provided by upstream source.

Repository SourceNeeds Review
Research

citation-verifier

No summary provided by upstream source.

Repository SourceNeeds Review