turboquant-memory

Compress and accelerate vector search in memory/RAG systems using TurboQuant (ICLR 2026) — near-optimal vector quantization with 5-8x compression and 98%+ search accuracy. Uses blockwise Hadamard rotation + Lloyd-Max scalar quantization. Use when: (1) optimizing embedding storage size, (2) speeding up semantic search, (3) user mentions "compress embeddings", "quantize vectors", "memory optimization", "faster search", "TurboQuant", "vector compression", or "embedding compression", (4) reducing memory footprint of RAG systems. Works with any embedding model (Gemini, OpenAI, Cohere, local) and any dimension ≥ 128. No GPU required. numpy only.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "turboquant-memory" with this command: npx skills add sunnyztj/turboquant-memory

TurboQuant Memory

Compress embedding vectors 5-8x with 98%+ search accuracy using TurboQuant (Google, ICLR 2026).

Quick Start

1. Run tests

python3 scripts/turboquant.py

15 built-in tests: FWHT correctness, MSE distortion, IP correlation, recall, compression ratio, determinism.

2. Validate on your data

python3 scripts/validate.py --db /path/to/memory.sqlite --auto-detect --bits 5

Auto-detects sqlite-vec vec0 tables, analyzes distribution, reports quantization quality and recall.

3. Quantize a memory database

python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --benchmark
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --migrate

4. Integrate into code

from turboquant import TurboQuantMSE

# Initialize (deterministic — same seed = same quantization)
tq = TurboQuantMSE(dim=3072, bits=5)

# Quantize for storage
stored = tq.quantize(embedding_vector)  # float32 → compressed

# Reconstruct
reconstructed = tq.dequantize(stored)   # compressed → float32

# Search: query stays float32, database is quantized
q_rot = tq.rotation.apply(query)
for doc in database:
    score = doc['norm'] * doc['scale'] * np.dot(q_rot, tq.codebook[doc['indices']])

Recommended Configuration

PresetModeBitsR@1CompressionUse Case
DefaultMSE598%6.4xMost memory/RAG search
ConservativeMSE698%+5.3xHigh-fidelity retrieval
AggressiveMSE492%8.0xLarge-scale, storage-constrained

Parameters

ParameterDefaultDescription
dimauto-detectEmbedding dimension (768, 1536, 3072, etc.)
bits5Bits per coordinate. See table above.
seed42Rotation seed. Same seed = reproducible quantization.

Algorithm

Blockwise Hadamard Rotation → Lloyd-Max Scalar Quantization

  1. Split vector into power-of-2 blocks (e.g., 3072 = 3 × 1024)
  2. Per block: random sign flip + Fast Walsh-Hadamard Transform (fully invertible)
  3. Per-vector scale normalization
  4. Lloyd-Max optimal scalar quantizer per coordinate (precomputed codebook for N(0,1))
  5. Pack indices into compact bit representation

Key properties:

  • Data-oblivious: no training or calibration needed
  • Fully invertible: zero information loss from rotation
  • Near-optimal: within 2.7x of Shannon information-theoretic lower bound
  • Deterministic: same seed = same output

See references/algorithm.md for full details.

Benchmark (Gemini embedding-001, 3072-dim, 112 vectors)

BitsMSECosineR@1R@5R@10Bytes/vecCompression
31.1e-50.98288%90%91%1,16010.6x
43.2e-60.99592%93%93%1,5448.0x
58.2e-70.99998%96%96%1,9286.4x
62.2e-71.00096%98%98%2,3125.3x
78e-81.000100%98%99%2,6964.6x
83e-81.00098%98%99%3,0804.0x

Compatibility

  • Python 3.9+, numpy only (no scipy, no GPU)
  • Any embedding dimension ≥ 128
  • Any embedding model (Gemini, OpenAI, Cohere, sentence-transformers, etc.)
  • SQLite / sqlite-vec vec0 tables (auto-detected)

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

GigaChat (Sber AI) Proxy

Integrate GigaChat (Sber AI) with OpenClaw via gpt2giga proxy

Registry SourceRecently Updated
3600smvlx
General

TencentCloud Video Face Fusion

通过提取两张人脸核心特征并实现自然融合,支持多种风格适配,提升创意互动性和内容传播力,广泛应用于创意营销、娱乐互动和社交分享场景。

Registry SourceRecently Updated
General

TencentCloud Image Face Fusion

图片人脸融合(专业版)为同步接口,支持自定义美颜、人脸增强、牙齿增强、拉脸等参数,最高支持8K分辨率,有多个模型类型供选择。

Registry SourceRecently Updated
General

YoudaoNote News

有道云笔记资讯推送:基于收藏笔记分析关注话题,推送最新相关资讯。支持对话触发与每日定时推送(如早上9点)。触发词:资讯推送、设置资讯推送、生成资讯推送。

Registry SourceRecently Updated
1.5K1lephix