TurboQuant Memory

Compress embedding vectors 5-8x with 98%+ search accuracy using TurboQuant (Google, ICLR 2026).

Quick Start

1. Run tests

python3 scripts/turboquant.py

15 built-in tests: FWHT correctness, MSE distortion, IP correlation, recall, compression ratio, determinism.

2. Validate on your data

python3 scripts/validate.py --db /path/to/memory.sqlite --auto-detect --bits 5

Auto-detects sqlite-vec vec0 tables, analyzes distribution, reports quantization quality and recall.

3. Quantize a memory database

python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --benchmark
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --migrate

4. Integrate into code

from turboquant import TurboQuantMSE

# Initialize (deterministic — same seed = same quantization)
tq = TurboQuantMSE(dim=3072, bits=5)

# Quantize for storage
stored = tq.quantize(embedding_vector)  # float32 → compressed

# Reconstruct
reconstructed = tq.dequantize(stored)   # compressed → float32

# Search: query stays float32, database is quantized
q_rot = tq.rotation.apply(query)
for doc in database:
    score = doc['norm'] * doc['scale'] * np.dot(q_rot, tq.codebook[doc['indices']])

Recommended Configuration

Preset	Mode	Bits	R@1	Compression	Use Case
Default	MSE	5	98%	6.4x	Most memory/RAG search
Conservative	MSE	6	98%+	5.3x	High-fidelity retrieval
Aggressive	MSE	4	92%	8.0x	Large-scale, storage-constrained

Parameters

Parameter	Default	Description
`dim`	auto-detect	Embedding dimension (768, 1536, 3072, etc.)
`bits`	5	Bits per coordinate. See table above.
`seed`	42	Rotation seed. Same seed = reproducible quantization.

Algorithm

Blockwise Hadamard Rotation → Lloyd-Max Scalar Quantization

Split vector into power-of-2 blocks (e.g., 3072 = 3 × 1024)
Per block: random sign flip + Fast Walsh-Hadamard Transform (fully invertible)
Per-vector scale normalization
Lloyd-Max optimal scalar quantizer per coordinate (precomputed codebook for N(0,1))
Pack indices into compact bit representation

Key properties:

Data-oblivious: no training or calibration needed
Fully invertible: zero information loss from rotation
Near-optimal: within 2.7x of Shannon information-theoretic lower bound
Deterministic: same seed = same output

See references/algorithm.md for full details.

Benchmark (Gemini embedding-001, 3072-dim, 112 vectors)

Bits	MSE	Cosine	R@1	R@5	R@10	Bytes/vec	Compression
3	1.1e-5	0.982	88%	90%	91%	1,160	10.6x
4	3.2e-6	0.995	92%	93%	93%	1,544	8.0x
5	8.2e-7	0.999	98%	96%	96%	1,928	6.4x
6	2.2e-7	1.000	96%	98%	98%	2,312	5.3x
7	8e-8	1.000	100%	98%	99%	2,696	4.6x
8	3e-8	1.000	98%	98%	99%	3,080	4.0x

Compatibility

Python 3.9+, numpy only (no scipy, no GPU)
Any embedding dimension ≥ 128
Any embedding model (Gemini, OpenAI, Cohere, sentence-transformers, etc.)
SQLite / sqlite-vec vec0 tables (auto-detected)

References

TurboQuant paper: arXiv:2504.19874 (ICLR 2026)
PolarQuant paper: arXiv:2502.02617 (AISTATS 2026)

turboquant-memory

Safety Notice

Copy this and send it to your AI assistant to learn

TurboQuant Memory

Quick Start

1. Run tests

2. Validate on your data

3. Quantize a memory database

4. Integrate into code

Recommended Configuration

Parameters

Algorithm

Benchmark (Gemini embedding-001, 3072-dim, 112 vectors)

Compatibility

References

Source Transparency

Related Skills

GigaChat (Sber AI) Proxy

TencentCloud Video Face Fusion

TencentCloud Image Face Fusion

YoudaoNote News