Token Reduction Engine (TRE)
Company Brain Core OS — Free, local, deterministic knowledge base for your agent. Start here if you need persistent facts before optimizing token spend.
Cut AI costs. Not quality. Not accuracy.
v1.0.0
Built and dogfooded by CertainLogicAI — Saves us 79%+ on every cached query. Measurement beats hope.
Part of the CertainLogic Stack
This skill works standalone. You don't need anything else to start cutting token costs today.
Works even better with Company Brain Core OS (clawhub install company-brain-os):
- Brain stores validated facts — TRE caches deterministic responses
- AgentPathfinder — audits which cache policies led to good outcomes
- Smart Router — routes non-cached queries to cheaper model tiers
All four are independent. Pick the one that solves your problem. Add others when you need them.
How It Works — Three Layers
TRE sits between your agent and the LLM. It intercepts queries and decides: cache hit, deterministic lookup, or expensive LLM call.
User Query → TRE Decision Layer → Result (cache | brain | LLM)
Layer 1: Query Cache (Answer Cache)
Stores LLM responses so repeated queries return instantly — zero tokens.
- SHA-256 hash of exact query as cache key
- LRU eviction — keeps hot answers, drops cold ones
- TTL-based expiry — stale data auto-purges (default: 3600s)
- Hallucination Guard — hedged answers ("I think", "maybe", "not sure") are shown to user but NOT cached
# After first LLM call:
tre.cache_answer("What is our refund policy?", "30 days, no questions.")
# Every subsequent call:
answer = tre.get_cached_answer("What is our refund policy?")
# → Instant. Zero tokens. No LLM hit.
Layer 2: Deterministic Lookup
For questions with factual answers, skip the LLM entirely.
- Intent classification — matches query to known domain (strategy, product, security, etc.)
- Facts DB overlay — structured key-value pairs with source attribution
- Zero LLM cost — pre-validated facts served direct from SQLite
"What products do we offer?" → Intent: product → Facts DB → Instant answer
"How does the brain work?" → Intent: strategy → Facts DB → Instant answer
"Who wrote the security policy?" → Intent: security → Facts DB → Instant answer
Layer 3: Intent Filtering (Sanity Gate)
Prevents expensive LLM calls that can't possibly have good answers.
- Forbidden intents — blocks "brain.delete_brain", "brain.purge", malicious commands
- Required fields gate — rejects malformed queries before they hit the LLM
- Domain alignment — refuses answers outside declared intent scope (prevents hallucinated guesses)
Major Benefits
1. 60-85% Token Cost Reduction
- Cache hit rate: up to 79% (measured on Brain workloads — your results depend on query repetition)
- Deterministic path: <10ms vs 500-2000ms LLM round-trip
- Cost comparison: $0 per cached answer vs $0.01-0.15 per LLM call
| Workload | Without TRE | With TRE | Savings |
|---|---|---|---|
| Repetitive queries (support, FAQ) | $10-15/day | $1-3/day | 85% |
| Mixed unique + repeat | $300/mo | $45-90/mo | 40-70% |
2. Zero Hallucination on Cached Facts
- Hallucination Guard detects hedged language: "I think", "maybe", "probably", "not sure"
- Flagged responses are shown but not cached — prevents poisoned cache
- SH A-256 verification on every write — tamper detection for audit trails
- Source attribution — every cached fact traces back to who loaded it and when
3. Instant Sub-10ms Responses
- Instant responses — SQLite-backed local cache serves answers from RAM (no network round-trip)
- Cold → hot acceleration — cache warmup brings first-query latency from ~700ms to sub-50ms
4. Agent Won't Run Amok
- Forbidden command list — "brain.delete_brain", "brain.purge" are blocked before execution
- Intent scope enforcement — agent refuses queries outside declared domain
- SHA-256 write verification — any tampered cache entry is rejected on read
- AgentPathfinder integration — HMAC-signed audit trail on every decision
5. Works With Any LLM Stack
- OpenAI (GPT-4o, GPT-4o-mini)
- Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
- Local models (Ollama, llama.cpp, vLLM)
- Multi-provider setups — TRE sits in front of all of them
# Works with any backend
llm_provider: openai # or anthropic, local, azure, etc.
tre:
cache_ttl: 3600
max_cache_size: 10000
hallucination_guard: true
6. Persisted Across Restarts
- Cache auto-saves to disk on every write
- Load persisted cache on startup — no warm-up period
- Configurable persistence path — store anywhere (local disk, shared volume, etc.)
7. Standalone — No Lock-in
- Runs on localhost — no external API, no SaaS dependency
- MIT License — free to fork, modify, commercialize
- No telemetry — your queries never leave your machine
- 50-line integration — drop into existing agent in minutes
Standalone Usage
clawhub install certainlogic-tre
from tre import cache_answer, get_cached_answer, get_metrics
# After getting an LLM response
tre.cache_answer(query, answer)
# Next time — zero tokens
cached = tre.get_cached_answer(query)
if cached:
answer, token_count = cached
print(f"Instant! Saved {token_count} tokens.")
# Check your savings
print(tre.get_metrics())
# {'cache_hits': 853, 'cache_misses': 223, 'cache_hit_rate_percent': 79.3}
See references/CONFIGURATION.md for full config options (TTL, cache size, guard sensitivity, forbidden commands).
See references/API.md for Python API reference.
Integration with Company Brain Core OS
When Brain is installed, TRE gets smarter:
- Facts DB pre-populates cache — Brain's 52+ facts are available instantly
- Intent classification improves — Brain's domain model refines routing
- Query classification — Brain categorizes new queries, expanding cache coverage
# Install both
clawhub install company-brain-os certainlogic-tre
# Brain loads facts → TRE caches them → instant responses for common questions
Metrics (Measured Ourselves)
| Metric | Value | Measurement Period |
|---|---|---|
| Cache hit rate | 79.3% | 2026-05-09, 615 queries |
| Avg latency (cache hit) | <10ms | Local SQLite |
| Avg latency (LLM miss) | 757ms | OpenRouter fallback |
| Hallucination guard flags | 0.2% | Hedge language detection |
| Token cost savings | 85% | Cached queries vs LLM calls |
| Forbidden command blocks | 100% | All delete_brain, purge blocked |
Free vs Pro
| Feature | Free | Pro (Planned) |
|---|---|---|
| Query answer cache | ✅ | ✅ |
| Deterministic lookup | ✅ | ✅ |
| Hallucination guard | ✅ | ✅ |
| Intent filtering | ✅ | ✅ |
| Cache persistence | ✅ | ✅ |
| Brain integration | ✅ | ✅ |
| Distributed cache (Redis) | ❌ | ✅ |
| Cache analytics dashboard | ❌ | ✅ |
| A/B testing policies | ❌ | ✅ |
| Team policy sharing | ❌ | ✅ |
| Auto policy optimization | ❌ | ✅ |
Pro pricing: TBD — join the waitlist at certainlogic.ai/tre
License
MIT — free to use, modify, distribute.
Built with brutal honesty by CertainLogic