Linear A Decipherment

Computational pipeline for analyzing Linear A inscriptions against Semitic roots, formalizing Cyrus H. Gordon's five-step decipherment methodology. Built on data from lashon-ha-kretan (1,701 inscriptions, 60 Gordon readings, 2,871 Proto-Semitic roots).

Base directory: ~/.claude/skills/linear-a-decipherment

Scholarly Disclaimer

All readings are hypothetical. Linear A remains officially undeciphered. Gordon's Semitic hypothesis is one of several competing frameworks. Include this disclaimer on every analytical output.

Confidence Taxonomy

Every proposed reading must be tagged with a confidence level:

Level Criteria Example

CONFIRMED Ideographic + phonetic + mathematical confirmation KU-NI-SU (emmer wheat)

PROBABLE Direct Gordon reading + external attestation DA-KU-SE-NE (Hurrian name at Nuzi)

CANDIDATE Gordon reading or strong Proto-Semitic match (d < 0.3) New cognate from distance search

SPECULATIVE Weak phonetic match or single-source evidence Proto-Semitic match with d > 0.5

Reference File Protocol

Route questions to the right reference before answering:

Question about a specific reading or word? → Read references/gordon-lexicon.md → Run: uv run scripts/cognate_search.py "WORD"

Question about methodology or approach? → Read references/methodology.md

Question about sign values or the syllabary? → Read references/sign-values.md

Question about ML/computational approaches? → Read references/ml-approaches.md

Question about a specific inscription? → Run: uv run scripts/analyze.py single INSCRIPTION_NAME

Question about corpus statistics? → Run: uv run scripts/sign_analysis.py SUBCOMMAND

Data Dependencies

Source data from lashon-ha-kretan :

File Path Contents

Inscriptions ~/Desktop/Programming/lashon-ha-kretan/LinearAInscriptions.js

~1,701 GORILA inscriptions

Lexicon ~/Desktop/Programming/lashon-ha-kretan/semiticLexicon.js

60 Gordon + 3 YasharMana + 7 scholarly readings

Proto-Semitic ~/Desktop/Programming/lashon-ha-kretan/etymology/Semitic.json

2,871 roots

Extracted data cached in data/ (generated by corpus_extract.py --all ):

data/corpus.json — Structured inscriptions
data/gordon.json — Gordon + YasharMana lexicon
data/semitic_roots.json — Proto-Semitic roots
data/cognate_cache.json — Precomputed cognate scores (built by cognate_search.py --build-cache )

If data/ files are missing, run extraction first:

uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --all

Workflows

Analyze a Single Inscription

Runs Gordon's 5-step pipeline on one inscription:

Human-readable report

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py single HT88

JSON output

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py single HT88 --format json

Steps performed: transliteration extraction, segmentation, consonantal skeleton for each word, cognate search (Gordon → YasharMana → Proto-Semitic cache), coverage summary.

Search Cognates for a Word

Find Semitic cognates for any Linear A transliteration:

Full search with table output

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA"

Skeleton extraction only

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --skeleton

JSON with top 10 matches

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --top 10 --format json

Skip cache for live Proto-Semitic search

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --no-cache

Pipeline: transliteration → skeleton (k-r-t) → Gordon direct → YasharMana → Proto-Semitic distance.

Find Unknown Words (Discovery Mode)

Identify frequently-occurring words with no known reading—best targets for new cognate proposals:

Top 20 unknown words appearing 3+ times

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode unknowns

More restrictive: top 10 appearing 5+ times

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode unknowns --min-count 5 --top 10

Find Promising Inscriptions

Inscriptions with the highest ratio of identified words—best for study:

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode promising --top 15

Compare Libation Formulas

Group inscriptions containing the libation formula (JA-SA-SA-RA-ME pattern):

List all libation inscriptions

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode libation

With skeleton alignment

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode libation --alignment

Corpus Statistics

Statistical analysis of sign patterns:

Sign frequency (top 30)

uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py frequency

Word frequency with hapax legomena count

uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py words

Sign co-occurrence within words

uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py cooccurrence --signs KI,RO,SA

Positional distribution (initial/medial/final)

uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py position

Site distribution (HT, ZA, PK, etc.)

uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py distribution

JSON output for any subcommand

uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py frequency --format json

Generate Training Data

Prepare JSONL for ML fine-tuning:

Preview first 3 entries

uv run ~/.claude/skills/linear-a-decipherment/scripts/finetune_prep.py gordon-pairs --preview 3

Generate full JSONL

uv run ~/.claude/skills/linear-a-decipherment/scripts/finetune_prep.py gordon-pairs --output data/gordon_pairs.jsonl

v1 produces 63 chat-format pairs (Gordon + YasharMana). See references/ml-approaches.md for v2 augmentation strategy.

Reverse Root Search (Semitic Root → Corpus Words)

Given a Semitic consonantal root, find all Linear A words in the corpus whose skeletons match:

Find corpus words matching root KNS (e.g., kiništu "gathering place")

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse kns

Broader search with higher distance tolerance

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse kns --max-dist 0.5 -n 30

JSON output for programmatic use

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse thm --format json

Search for Baal-related words (b-'-l root)

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse bl

Search for "give" root (y-t-n)

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse ytn

Pipeline: root consonants → weighted Levenshtein against all corpus word skeletons → ranked by distance, annotated with Gordon/YasharMana readings, occurrence counts, sites, and inscriptions.

Extract / Rebuild Corpus

Extract structured data from JS source files:

Extract everything (inscriptions + lexicons + Proto-Semitic roots)

uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --all

Inscriptions only, filtered by site

uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --site HT

Include Gordon lexicon

uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --with-gordon

Build cognate cache (takes ~10 seconds)

uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --build-cache

Integration with Other Skills

Skill Usage

rlama

Create gordon-dossiers RAG collection from ~/Desktop/minoanmystery-astro/souls/minoan/dossiers/scholarly-sources/gordon/

ancient-near-east-research

Sefaria for Hebrew cognate verification, CDLI for Akkadian parallels

exa-search

Search recent computational decipherment papers

llama-cpp

Local inference with fine-tuned decipherment models (v2)

Architecture

~/.claude/skills/linear-a-decipherment/ ├── SKILL.md # This file ├── lib/ # Shared Python library │ ├── init.py │ ├── types.py # Frozen dataclasses (Inscription, LexiconEntry, CognateMatch) │ ├── js_parser.py # JS Map → Python dict extraction │ ├── normalization.py # normalize(), lookup_in(), J/Y swap │ ├── skeleton.py # SIGN_DECOMPOSITION, extract_skeleton() │ └── phonetics.py # SEMITIC_DISTANCES, weighted_levenshtein() ├── scripts/ │ ├── corpus_extract.py # JS → JSON extraction │ ├── cognate_search.py # Forward + reverse cognate search + cache builder │ ├── sign_analysis.py # Corpus-wide sign statistics │ ├── analyze.py # Gordon 5-step pipeline (single + batch) │ └── finetune_prep.py # ML training data generation ├── references/ │ ├── gordon-lexicon.md # Complete 60+3+7 entry lexicon tables │ ├── methodology.md # Gordon's methods, 5-step pipeline │ ├── sign-values.md # Sign confidence levels (HIGH/MEDIUM/LOW) │ └── ml-approaches.md # Computational decipherment survey (v2) └── data/ # Generated (not committed) ├── corpus.json # 1,701 inscriptions ├── gordon.json # 60 Gordon + 3 YasharMana + 7 scholarly entries ├── semitic_roots.json # 2,871 Proto-Semitic roots └── cognate_cache.json # Precomputed cognate scores

All scripts use uv run with PEP 723 inline metadata. Dependencies: stdlib only.

linear-a-decipherment

Safety Notice

Copy this and send it to your AI assistant to learn