UFO Research Workbench
Multi-source UAP/UFO release triage, evidence grading, and structured analytic reporting — with built-in skepticism guardrails.
This skill transforms folders of declassified UAP/UFO/FOIA documents into structured, graded analytic artifacts. It extends uap-release-analyzer with multi-source triage, evidence-provenance scoring, a claim taxonomy, cross-release comparison, timeline and entity-relationship mapping, and explicit uncertainty tracking.
Core philosophy: treat every document as a claim-bearing artifact, not as ground truth. Grade the source, not just the content. Always distinguish what the document says from what the analyst infers from what is independently verified.
When to use
Trigger when the user asks to analyze, compare, or investigate UAP/UFO document releases. Example prompts:
- "Analyze the UAP release folder at
~/Downloads/release_03/" - "Compare release_01 and release_02 — what changed?"
- "Build an evidence matrix for these AARO files"
- "What are the redaction patterns in this FBI Vault tranche?"
- "Map the entities and timeline from these congressional hearing PDFs"
- "I have files from war.gov, AARO, and FBI Vault — triage them"
- Any request for a structured UAP/UFO research report
If the user just drops a folder path and says "what's in here?", this skill applies. If they name specific agencies or release sources, use the multi-source triage framework in this skill to classify and route each source correctly.
The upgraded workflow
This skill adds five layers beyond the original uap-release-analyzer pipeline:
Layer 0 — Source Triage → What kind of release is this? Which agency/venue?
Layer 1 — Inventory & Extract → What files exist? Extract text (pdfplumber).
Layer 2 — Claim Taxonomy → Classify every extractable claim by type.
Layer 3 — Evidence Grading → Score provenance, corroboration, and reliability.
Layer 4 — Cross-Release Diff → What changed vs. a prior release?
Layer 5 — Structured Artifacts → Produce graded, caveated output bundles.
Layer 0 — Multi-Source Triage
Before running any extraction, classify the release source. Different sources have different reliability profiles, metadata conventions, and redaction norms. See references/source-taxonomy.md for the full classification table.
| Source | Signal Type | Reliability Baseline | Typical Formats |
|---|---|---|---|
| war.gov/UFO/ | Official DOW release portal | High (official) | PDF, PNG, video |
| AARO | Congressionally-mandated office | High (official) | PDF reports, data tables |
| FBI Vault | FOIA-processed case files | Medium (redacted) | Scanned PDF, PNG |
| NARA | Archival record groups | Medium–High | PDF, microfilm scans |
| Congressional archives | Hearing transcripts, CRS reports | High (official record) | PDF, HTML |
| DoS cables | State Department diplomatic traffic | Medium (redacted) | |
| Third-party compilations | Aggregated by researchers/NGOs | Variable — grade individually | Mixed |
Triage output: mark each file with source provenance before any analysis. This drives evidence grading in Layer 3.
Layer 1 — Inventory & Text Extraction
Run the bundled scripts/inventory.py <release_root> to build inventory.csv with one row per file: filename, page count, size, source tag, format, text-layer status. This is a direct adaptation from uap-release-analyzer.
For text extraction, use scripts/extract_text.py <release_root> [start] [end] (pdfplumber-based). Files with no text layer produce 0-char outputs — these are flagged as OCR-needed, not silently dropped. The script is idempotent and chunkable.
Layer 2 — Claim Taxonomy
After text extraction, classify every identifiable claim in the text into one of these types:
| Claim Type | Definition | Confidence Signal |
|---|---|---|
| Sighting | Visual observation report (pilot, ground, civilian) | Corroborating sensor data, multiple witnesses |
| Sensor | Instrument reading (radar, FLIR, infrared, satellite) | Calibrated platform, raw data available |
| Whistleblower | Testimony from a named or anonymous insider | Corroboration, position access, consistency over time |
| Agency Memo | Internal government communication, policy document | Chain-of-custody, classification level, date |
| Redaction Pattern | What was withheld and under which exemption | FOIA code present, pattern consistency |
| Timeline Inconsistency | Contradiction between two documents about the same event | Explicit date mismatch, conflicting accounts |
| Photo/Video | Visual media attached to or referenced in a file | Chain-of-custody, metadata integrity, source |
| Other / Unclassified | Catch-all for claims not fitting above | Flag for manual review |
Classification is best-effort (keyword + pattern matching), not full NLP. The uncertainty register (Layer 5) must note this limitation.
Layer 3 — Evidence Grading & Provenance Scoring
Each claim or document receives a provenance score on three axes. See references/evidence-grading.md for the full rubric.
Axis A — Source Reliability (1–5)
| Score | Meaning |
|---|---|
| 5 | Official release from a verified government portal (war.gov, AARO, congress.gov) |
| 4 | FOIA release from a recognized archive (FBI Vault, NARA) |
| 3 | Government document from a third-party mirror (confirmed hash match) |
| 2 | Unofficial compilation with partial provenance |
| 1 | Unverifiable source, anonymous upload, no chain-of-custody |
Axis B — Corroboration (1–5)
| Score | Meaning |
|---|---|
| 5 | Corroborated by 3+ independent sources with different provenance chains |
| 4 | Corroborated by 2 independent sources |
| 3 | Corroborated by 1 other source |
| 2 | Uncorroborated but internally consistent |
| 1 | Uncorroborated and internally inconsistent |
Axis C — Information Specificity (1–5)
| Score | Meaning |
|---|---|
| 5 | Specific dates, locations, names, and verifiable identifiers |
| 4 | Specific dates and locations but names redacted |
| 3 | Partial identifiers (year only, region only) |
| 2 | Vague temporal/spatial references |
| 1 | No specific identifiers; purely narrative |
Overall Evidence Grade: (A × 2 + B × 2 + C) / 5 → rounded to one decimal. Display as Grade: X.X/5.0.
Layer 4 — Cross-Release Comparison
When the user has multiple release folders (e.g., release_01/, release_02/), run a structured comparison:
- Inventory diff: files added, removed, renamed, resized between releases.
- Redaction delta: which files gained or lost redactions? Did exemption patterns shift?
- Claim overlap: which claims appear in both releases? Which are new? Which were removed?
- Text-change detection: for files with the same name across releases, detect substantive text changes (not just OCR noise).
- Narrative shift: did the framing, terminology, or emphasis change between releases?
Output a DIFF.md or embed the comparison in the report's cross-release section.
The bundled scripts/diff_releases.py <release_A> <release_B> handles inventory diff, redaction delta, and same-name text comparison. For claims and narrative shifts, the agent applies the taxonomy manually.
Layer 5 — Structured Analytic Artifacts
Produce these artifacts for every analysis. Use templates/ufo-report-template.md as the starting template, then customize per release.
Standard artifact bundle:
- Release Inventory (
inventory.csv) — per-file metadata with source provenance tags - Evidence Matrix (
evidence-matrix.md) — claims graded by type, source, corroboration, and overall score; sorted by grade descending - Chronology (
chronology.md) — timeline of claims/events sorted by date, with uncertainty flags - Redaction Heatmap (
redaction-heatmap.md) — which files are most redacted, under which exemptions; visualized as a sorted table - Entity Map (
entity-map.md) — organizations, people, locations, programs mentioned; relationships where extractable - Uncertainty Register (
uncertainty-register.md) — every assumption, inference, or gap explicitly logged - Executive Brief (
executive-brief.md) — one-page summary: key findings, top-5 graded claims, major gaps, recommended next steps
If the user only needs a subset, produce what they ask for. Default to the full bundle for comprehensive analysis requests.
Skepticism & Hallucination Guardrails
This skill enforces mandatory uncertainty labeling. Every analytic output must distinguish:
| Label | Meaning | Example |
|---|---|---|
[DOC CLAIM] | The document asserts this | "Per DOW memo 2023-04-12, 'object exhibited transmedium capability'" |
[INFERRED] | The analyst deduces this from patterns | "Redaction of (b)(1) + NOFORN suggests ongoing classification concern" |
[VERIFIED] | Confirmed by 2+ independent sources | "AARO public report and congressional testimony both confirm date" |
[UNCORROBORATED] | Single-source, no cross-check possible | "Witness statement cites no corroborating sensor data" |
[CONTRADICTED] | Two sources disagree on this point | "Cable says 2022-03; flight log says 2022-04" |
Hard rules:
- Never present
[DOC CLAIM]as fact. Always preserve the "the document says X" framing. - Never present
[INFERRED]as[VERIFIED]. If you can't cross-check, say so. - When two sources contradict, log the contradiction in the uncertainty register and flag both claims as
[CONTRADICTED]. Do not pick a winner without explicit evidence. - If the user asks for opinions or speculation, label it explicitly as
[SPECULATIVE]and separate it from the analytic output.
Working with the agent (prompt-first workflow)
This skill is prompt-first: the agent reads these instructions and guides the analytic process. The bundled scripts handle inventory, extraction, and diff — the rest is agent-driven reasoning with structured templates.
Quick-start: single release
- User provides a release path. Confirm the path exists. Ask which source it came from if not obvious.
- Run
scripts/inventory.py <release_root>→ confirm file count and source mix. - Run
scripts/extract_text.py <release_root>(chunk if >50 files or timeout risk). - Read extracted texts and classify claims using the taxonomy (Layer 2).
- Grade evidence using the rubric (Layer 3). Apply skepticism labels.
- Produce the artifact bundle requested by the user (Layer 5).
Quick-start: multi-release comparison
- Confirm both release paths. Run inventory on both.
- Run
scripts/diff_releases.py <release_A> <release_B>for structural diffs. - Run extraction on the newer release (and the older if not already extracted).
- For same-name files across releases, surface text changes. Flag narrative shifts.
- Produce
DIFF.mdor embed comparison in the executive brief.
Quick-start: multi-source triage
- User drops a folder with mixed-source files. Run inventory.
- Use
references/source-taxonomy.mdto classify each file's source. - Tag each file in
inventory.csvwith its source provenance. - Grade baseline reliability per source before analyzing content.
- Flag any file whose source cannot be determined for manual review.
Report template
Use templates/ufo-report-template.md. It includes all standard sections with labeling conventions. Customize per release — never fill a section with "TBD" without noting why the data is unavailable.
Bundled scripts
scripts/inventory.py <release_root>— file inventory with source tagging (adaptation from uap-release-analyzer)scripts/extract_text.py <release_root> [start] [end]— pdfplumber text extraction, idempotentscripts/diff_releases.py <release_A> <release_B>— cross-release comparison
These are minimal, fast, and idempotent. The analytic heavy lifting (claim taxonomy, evidence grading, artifact generation) is agent-driven using the templates and reference materials in this skill.
Bundled references
references/source-taxonomy.md— multi-source classification, reliability profiles, URL/access patternsreferences/evidence-grading.md— full grading rubric with examplestemplates/ufo-report-template.md— standardized report structure with all artifact sections
Bundled templates
templates/ufo-report-template.md— the master report template covering all standard artifact sections
Dependencies
python3 -m pip install pdfplumber pypdf
Honest limitations
- No OCR by default. Scanned PDFs with no text layer are flagged, not OCR'd. Tesseract on thousands of pages is hours of work — offer as a separate pass.
- Entity extraction is keyword + regex, not NER. Claims are pattern-matched, not semantically parsed. The uncertainty register must note this.
- Claim taxonomy is best-effort. Classification uses heuristics and keyword matching. Ambiguous claims go to
Other / Unclassifiedwith a note. - Evidence grading is the analyst's judgment. The rubric provides structure, but applying it requires reading and reasoning. There is no automated "truth score" — only structured, documented judgment.
- Cross-release diff has a filename-match bias. Files renamed between releases will appear as add/remove pairs, not modifications. Surface this caveat.
- This skill does not make truth claims about UAP. It analyzes documents. The conclusions are about what documents say, not about what exists in the physical world.