Paper Reference Checker
Systematically verify academic references to detect AI-hallucinated or fabricated citations. Queries Google Scholar, arXiv, CNKI, and other databases.
Core Workflow
Phase 1: Citation Extraction (Token-Efficient First)
ALWAYS use targeted extraction before full-document reading — saves 80–95% tokens.
| Input Type | Primary Method | Fallback |
|---|---|---|
| arXiv link | arxiv.org/html/{ID} → find references section | Full HTML, then PDF |
| PDF file | Last 15–20% of pages only | Expand to 30% → 50% → full |
| Overleaf link | Regex cite-keys from .tex → filter .bib/.bbl | Inline bibitem in .tex |
| Pasted list | Parse directly | — |
Phase 2: Multi-Platform Querying (Priority Order)
- DOI → https://doi.org/{DOI} — resolves = ✅ confirmed
- arXiv ID → https://arxiv.org/abs/{ID} — match = ✅ confirmed
- Google Scholar → search "Full Title"
- arXiv search → arxiv.org/search/
- CNKI → cnki.net
- Fallbacks: Semantic Scholar · PubMed · IEEE Xplore · ACM DL · DBLP
Phase 3: Authenticity Judgment
| Status | Label | Criteria |
|---|---|---|
| ✅ | VERIFIED | Found in ≥1 authoritative DB |
| ⚠️ | UNCERTAIN | Partial match |
| ❌ | NOT FOUND | No match across all queried channels |
| 🔴 | FABRICATED | Non-existent venue, unresolvable DOI |
| 🔗 | BROKEN CITATION | [?] marker in PDF body |
Phase 4: Report Output
See examples/sample-report.md for full example.
Support Files
| File | Purpose |
|---|---|
| references/citation-extraction.md | Format rules |
| references/search-strategies.md | Per-database query tactics |
| references/verification-criteria.md | Decision flowchart |
| scripts/extract_references.md | Full decision tree |
| examples/sample-report.md | Complete report example |
| examples/bibtex-example.bib | Annotated BibTeX |
Databases
Google Scholar · arXiv · CNKI · Semantic Scholar · PubMed · IEEE Xplore · ACM DL · DBLP · DOI Resolver · Crossref