UFO Research Workbench

Multi-source UAP/UFO release triage, evidence grading, and structured analytic reporting — with built-in skepticism guardrails.

This skill transforms folders of declassified UAP/UFO/FOIA documents into structured, graded analytic artifacts. It extends uap-release-analyzer with multi-source triage, evidence-provenance scoring, a claim taxonomy, cross-release comparison, timeline and entity-relationship mapping, and explicit uncertainty tracking.

Core philosophy: treat every document as a claim-bearing artifact, not as ground truth. Grade the source, not just the content. Always distinguish what the document says from what the analyst infers from what is independently verified.

When to use

Trigger when the user asks to analyze, compare, or investigate UAP/UFO document releases. Example prompts:

"Analyze the UAP release folder at ~/Downloads/release_03/"
"Compare release_01 and release_02 — what changed?"
"Build an evidence matrix for these AARO files"
"What are the redaction patterns in this FBI Vault tranche?"
"Map the entities and timeline from these congressional hearing PDFs"
"I have files from war.gov, AARO, and FBI Vault — triage them"
Any request for a structured UAP/UFO research report

If the user just drops a folder path and says "what's in here?", this skill applies. If they name specific agencies or release sources, use the multi-source triage framework in this skill to classify and route each source correctly.

The upgraded workflow

This skill adds five layers beyond the original uap-release-analyzer pipeline:

Layer 0 — Source Triage       → What kind of release is this? Which agency/venue?
Layer 1 — Inventory & Extract  → What files exist? Extract text (pdfplumber).
Layer 2 — Claim Taxonomy       → Classify every extractable claim by type.
Layer 3 — Evidence Grading     → Score provenance, corroboration, and reliability.
Layer 4 — Cross-Release Diff   → What changed vs. a prior release?
Layer 5 — Structured Artifacts → Produce graded, caveated output bundles.

Layer 0 — Multi-Source Triage

Before running any extraction, classify the release source. Different sources have different reliability profiles, metadata conventions, and redaction norms. See references/source-taxonomy.md for the full classification table.

Source	Signal Type	Reliability Baseline	Typical Formats
war.gov/UFO/	Official DOW release portal	High (official)	PDF, PNG, video
AARO	Congressionally-mandated office	High (official)	PDF reports, data tables
FBI Vault	FOIA-processed case files	Medium (redacted)	Scanned PDF, PNG
NARA	Archival record groups	Medium–High	PDF, microfilm scans
Congressional archives	Hearing transcripts, CRS reports	High (official record)	PDF, HTML
DoS cables	State Department diplomatic traffic	Medium (redacted)	PDF
Third-party compilations	Aggregated by researchers/NGOs	Variable — grade individually	Mixed

Triage output: mark each file with source provenance before any analysis. This drives evidence grading in Layer 3.

Layer 1 — Inventory & Text Extraction

Run the bundled scripts/inventory.py <release_root> to build inventory.csv with one row per file: filename, page count, size, source tag, format, text-layer status. This is a direct adaptation from uap-release-analyzer.

For text extraction, use scripts/extract_text.py <release_root> [start] [end] (pdfplumber-based). Files with no text layer produce 0-char outputs — these are flagged as OCR-needed, not silently dropped. The script is idempotent and chunkable.

Layer 2 — Claim Taxonomy

After text extraction, classify every identifiable claim in the text into one of these types:

Claim Type	Definition	Confidence Signal
Sighting	Visual observation report (pilot, ground, civilian)	Corroborating sensor data, multiple witnesses
Sensor	Instrument reading (radar, FLIR, infrared, satellite)	Calibrated platform, raw data available
Whistleblower	Testimony from a named or anonymous insider	Corroboration, position access, consistency over time
Agency Memo	Internal government communication, policy document	Chain-of-custody, classification level, date
Redaction Pattern	What was withheld and under which exemption	FOIA code present, pattern consistency
Timeline Inconsistency	Contradiction between two documents about the same event	Explicit date mismatch, conflicting accounts
Photo/Video	Visual media attached to or referenced in a file	Chain-of-custody, metadata integrity, source
Other / Unclassified	Catch-all for claims not fitting above	Flag for manual review

Classification is best-effort (keyword + pattern matching), not full NLP. The uncertainty register (Layer 5) must note this limitation.

Layer 3 — Evidence Grading & Provenance Scoring

Each claim or document receives a provenance score on three axes. See references/evidence-grading.md for the full rubric.

Axis A — Source Reliability (1–5)

Score	Meaning
5	Official release from a verified government portal (war.gov, AARO, congress.gov)
4	FOIA release from a recognized archive (FBI Vault, NARA)
3	Government document from a third-party mirror (confirmed hash match)
2	Unofficial compilation with partial provenance
1	Unverifiable source, anonymous upload, no chain-of-custody

Axis B — Corroboration (1–5)

Score	Meaning
5	Corroborated by 3+ independent sources with different provenance chains
4	Corroborated by 2 independent sources
3	Corroborated by 1 other source
2	Uncorroborated but internally consistent
1	Uncorroborated and internally inconsistent

Axis C — Information Specificity (1–5)

Score	Meaning
5	Specific dates, locations, names, and verifiable identifiers
4	Specific dates and locations but names redacted
3	Partial identifiers (year only, region only)
2	Vague temporal/spatial references
1	No specific identifiers; purely narrative

Overall Evidence Grade: (A × 2 + B × 2 + C) / 5 → rounded to one decimal. Display as Grade: X.X/5.0.

Layer 4 — Cross-Release Comparison

When the user has multiple release folders (e.g., release_01/, release_02/), run a structured comparison:

Inventory diff: files added, removed, renamed, resized between releases.
Redaction delta: which files gained or lost redactions? Did exemption patterns shift?
Claim overlap: which claims appear in both releases? Which are new? Which were removed?
Text-change detection: for files with the same name across releases, detect substantive text changes (not just OCR noise).
Narrative shift: did the framing, terminology, or emphasis change between releases?

Output a DIFF.md or embed the comparison in the report's cross-release section.

The bundled scripts/diff_releases.py <release_A> <release_B> handles inventory diff, redaction delta, and same-name text comparison. For claims and narrative shifts, the agent applies the taxonomy manually.

Layer 5 — Structured Analytic Artifacts

Produce these artifacts for every analysis. Use templates/ufo-report-template.md as the starting template, then customize per release.

Standard artifact bundle:

Release Inventory (inventory.csv) — per-file metadata with source provenance tags
Evidence Matrix (evidence-matrix.md) — claims graded by type, source, corroboration, and overall score; sorted by grade descending
Chronology (chronology.md) — timeline of claims/events sorted by date, with uncertainty flags
Redaction Heatmap (redaction-heatmap.md) — which files are most redacted, under which exemptions; visualized as a sorted table
Entity Map (entity-map.md) — organizations, people, locations, programs mentioned; relationships where extractable
Uncertainty Register (uncertainty-register.md) — every assumption, inference, or gap explicitly logged
Executive Brief (executive-brief.md) — one-page summary: key findings, top-5 graded claims, major gaps, recommended next steps

If the user only needs a subset, produce what they ask for. Default to the full bundle for comprehensive analysis requests.

Skepticism & Hallucination Guardrails

This skill enforces mandatory uncertainty labeling. Every analytic output must distinguish:

Label	Meaning	Example
`[DOC CLAIM]`	The document asserts this	"Per DOW memo 2023-04-12, 'object exhibited transmedium capability'"
`[INFERRED]`	The analyst deduces this from patterns	"Redaction of (b)(1) + NOFORN suggests ongoing classification concern"
`[VERIFIED]`	Confirmed by 2+ independent sources	"AARO public report and congressional testimony both confirm date"
`[UNCORROBORATED]`	Single-source, no cross-check possible	"Witness statement cites no corroborating sensor data"
`[CONTRADICTED]`	Two sources disagree on this point	"Cable says 2022-03; flight log says 2022-04"

Hard rules:

Never present [DOC CLAIM] as fact. Always preserve the "the document says X" framing.
Never present [INFERRED] as [VERIFIED]. If you can't cross-check, say so.
When two sources contradict, log the contradiction in the uncertainty register and flag both claims as [CONTRADICTED]. Do not pick a winner without explicit evidence.
If the user asks for opinions or speculation, label it explicitly as [SPECULATIVE] and separate it from the analytic output.

Working with the agent (prompt-first workflow)

This skill is prompt-first: the agent reads these instructions and guides the analytic process. The bundled scripts handle inventory, extraction, and diff — the rest is agent-driven reasoning with structured templates.

Quick-start: single release

User provides a release path. Confirm the path exists. Ask which source it came from if not obvious.
Run scripts/inventory.py <release_root> → confirm file count and source mix.
Run scripts/extract_text.py <release_root> (chunk if >50 files or timeout risk).
Read extracted texts and classify claims using the taxonomy (Layer 2).
Grade evidence using the rubric (Layer 3). Apply skepticism labels.
Produce the artifact bundle requested by the user (Layer 5).

Quick-start: multi-release comparison

Confirm both release paths. Run inventory on both.
Run scripts/diff_releases.py <release_A> <release_B> for structural diffs.
Run extraction on the newer release (and the older if not already extracted).
For same-name files across releases, surface text changes. Flag narrative shifts.
Produce DIFF.md or embed comparison in the executive brief.

Quick-start: multi-source triage

User drops a folder with mixed-source files. Run inventory.
Use references/source-taxonomy.md to classify each file's source.
Tag each file in inventory.csv with its source provenance.
Grade baseline reliability per source before analyzing content.
Flag any file whose source cannot be determined for manual review.

Report template

Use templates/ufo-report-template.md. It includes all standard sections with labeling conventions. Customize per release — never fill a section with "TBD" without noting why the data is unavailable.

Bundled scripts

scripts/inventory.py <release_root> — file inventory with source tagging (adaptation from uap-release-analyzer)
scripts/extract_text.py <release_root> [start] [end] — pdfplumber text extraction, idempotent
scripts/diff_releases.py <release_A> <release_B> — cross-release comparison

These are minimal, fast, and idempotent. The analytic heavy lifting (claim taxonomy, evidence grading, artifact generation) is agent-driven using the templates and reference materials in this skill.

Bundled references

references/source-taxonomy.md — multi-source classification, reliability profiles, URL/access patterns
references/evidence-grading.md — full grading rubric with examples
templates/ufo-report-template.md — standardized report structure with all artifact sections

Bundled templates

templates/ufo-report-template.md — the master report template covering all standard artifact sections

Dependencies

python3 -m pip install pdfplumber pypdf

Honest limitations

No OCR by default. Scanned PDFs with no text layer are flagged, not OCR'd. Tesseract on thousands of pages is hours of work — offer as a separate pass.
Entity extraction is keyword + regex, not NER. Claims are pattern-matched, not semantically parsed. The uncertainty register must note this.
Claim taxonomy is best-effort. Classification uses heuristics and keyword matching. Ambiguous claims go to Other / Unclassified with a note.
Evidence grading is the analyst's judgment. The rubric provides structure, but applying it requires reading and reasoning. There is no automated "truth score" — only structured, documented judgment.
Cross-release diff has a filename-match bias. Files renamed between releases will appear as add/remove pairs, not modifications. Surface this caveat.
This skill does not make truth claims about UAP. It analyzes documents. The conclusions are about what documents say, not about what exists in the physical world.

ufo

Safety Notice

Copy this and send it to your AI assistant to learn