MidOS Memory Cascade

A self-tuning, auto-escalating search engine that tries each memory tier from fastest to slowest, stopping as soon as it finds a high-confidence answer.

What It Does

Instead of the agent deciding which storage layer to query, the cascade tries each tier automatically:

Tier	Storage	Latency	Strategy
T0	In-memory session cache	<1ms	Exact + fuzzy key match
T1	JSON state files	<5ms	Filename + key match
T2	SQLite (pipeline_synergy.db)	<5ms	Structured SQL LIKE
T3	SQLite FTS5	<1ms	Full-text keyword on 22K rows
T4	Grep over 46K chunks	~3s	Brute-force ripgrep fallback
T5	LanceDB keyword (BM25)	slow	670K vector rows, no embeddings
T5b	LanceDB semantic	3–30s	Embedding similarity, last resort

Question routing: Queries starting with how/what/why/etc. skip keyword tiers and route directly to semantic search.

Self-learning: The cascade records which tier resolves each query. After enough history, evolve() learns shortcuts (skip directly to the winning tier) and marks consistently-empty tiers for skip.

Usage

Python API

from tools.memory.memory_cascade import recall, store

# Search across all tiers
result = recall("adaptive alpha reranking")
# → {"answer": {...}, "tier": "T5:lancedb", "latency_ms": 340, "confidence": 0.87}

# Write to the right storage automatically
store("pattern", content="...", tags=["ml", "reranking"])

CLI

# Search
python memory_cascade.py recall "query here"

# View tier resolution stats
python memory_cascade.py stats

# Run self-evolution (learn shortcuts + tier skips)
python memory_cascade.py evolve

`recall()` Options

recall(
    query: str,
    min_confidence: float = 0.5,  # stop escalating at this threshold
    max_tier: int = 6             # 0=T0 only, 6=all tiers
)

Returns:

{
  "answer": { "source": "...", "text": "..." },
  "confidence": 0.87,
  "latency_ms": 340.2,
  "tiers_tried": 3,
  "resolved_at": "T5:lancedb",
  "shortcut": null,
  "question_routed": false,
  "escalation": [...]
}

Requirements

Python 3.10+ (stdlib only for core cascade logic)
Optional: hive_commons for LanceDB tiers (T5/T5b)
Optional: tools.memory.memory_router for store() routing

The cascade degrades gracefully — if LanceDB is unavailable, it stops at grep (T4). All stdlib tiers (T0–T4) work with zero dependencies.

Architecture Notes

Thread-safe: Session cache uses threading.Lock; stats writes use separate locks
Cross-process safe: JSONL writes use OS-level file locking (msvcrt on Windows, fcntl on Unix)
Confidence scoring: Term overlap × score × content richness → normalized 0–1
Stats persistence: knowledge/SYSTEM/cascade_stats.json accumulates hit rates per tier

Built with MidOS. 1 of 200+ skills. Full ecosystem at midos.dev/pro