hybrid-retrieval

Design and build a hybrid retrieval system combining BM25 keyword search, vector embeddings, and knowledge graph traversal for AI agent memory. Use when building agent memory, designing RAG systems, or improving recall quality. Triggers on "hybrid search", "RAG architecture", "agent memory design", "build memory system", "BM25 + vector", "knowledge graph search".

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "hybrid-retrieval" with this command: npx skills add vnesin-sarai/hybrid-retrieval

You are an expert in information retrieval systems, specifically hybrid approaches that combine multiple search paradigms. Help the user design and build a retrieval system inspired by the BlackRock/NVIDIA HybridRAG paper.

Core Insight

No single retrieval method works for everything:

MethodStrengthWeakness
BM25 (keyword)Exact matches, names, IDs, codesMisses synonyms and semantic meaning
Vector (embedding)Semantic similarity, paraphrasesStruggles with exact terms, numbers, names
Graph (knowledge graph)Relationships, multi-hop reasoningRequires structured extraction, maintenance

The hybrid approach: Run all three in parallel, then fuse results with weighted scoring. Each method catches what the others miss.

Architecture Pattern

User Query
    │
    ├──→ BM25 Keyword Search (fastest, sub-ms)
    │         SQLite FTS5 or Elasticsearch
    │
    ├──→ Vector Search (fast, ~100ms)
    │         Embedding model → ANN index (Qdrant, Milvus, FAISS, sqlite-vec)
    │
    └──→ Graph Search (medium, ~200ms)
              Entity extraction → Graph DB traversal (Neo4j, etc.)
    │
    └──→ Fusion Layer
              Weighted merge → Deduplication → Reranking → Top-K results

Step-by-Step Design

Step 1: Choose Your Document Store

Your chunks need to live somewhere. Options:

  • SQLite + FTS5 + vec0 — Single file, zero infrastructure, good up to ~100K chunks
  • PostgreSQL + pgvector — Production-ready, handles millions
  • Qdrant / Milvus — Purpose-built vector DBs, best for scale
  • Elasticsearch — If you already use it, it does BM25 + vector natively

Recommendation for most projects: Start with SQLite (FTS5 for keywords, vec0 for vectors). Migrate when you hit performance limits.

Step 2: Choose Your Embedding Model

ModelDimensionsQualitySpeedCost
OpenAI text-embedding-3-small1536GoodFast$0.02/1M tokens
Voyage AI voyage-31024Very goodFast$0.06/1M tokens
NV-Embed-v2 (self-hosted)4096ExcellentMediumFree (GPU needed)
nomic-embed-text (Ollama)768GoodFastFree (CPU ok)

Key decision: Self-hosted = free but needs GPU. Cloud = easy but recurring cost. For production agent memory, self-hosted pays for itself quickly.

Step 3: Chunking Strategy

Bad chunking ruins everything. Rules:

  1. Chunk by semantic unit — sections, paragraphs, conversations. NOT fixed-size windows.
  2. Include metadata — file path, date, source type. You'll filter on this later.
  3. Overlap sparingly — 10-20% overlap prevents losing context at boundaries.
  4. Keep chunks 200-600 tokens — too small = no context, too large = noise.

Step 4: BM25 Layer

-- SQLite FTS5 example
CREATE VIRTUAL TABLE chunks_fts USING fts5(path, text, source);

-- Search
SELECT path, text, rank
FROM chunks_fts
WHERE chunks_fts MATCH 'query terms'
ORDER BY rank
LIMIT 20;

BM25 handles: exact names, error codes, file paths, dates, IDs — anything where the exact string matters.

Step 5: Vector Layer

# Embed query
query_vec = embed("What is the deployment status?")

# ANN search (sqlite-vec example)
results = db.execute(
    "SELECT id, distance FROM chunks_vec "
    "WHERE embedding MATCH ? AND k = ? ORDER BY distance",
    (query_vec_blob, 20)
)

Vector handles: semantic questions, paraphrases, "find things related to X" — meaning over matching.

Step 6: Graph Layer (Optional but Powerful)

// Neo4j: Find entity and its connections
MATCH (n) WHERE n.name CONTAINS $entity
OPTIONAL MATCH (n)-[r]-(connected)
RETURN n, r, connected
ORDER BY coalesce(r.weight, 1.0) DESC
LIMIT 10

Graph handles: "Who works with X?", "What's related to Y?", multi-hop reasoning — relationships that flat search can't find.

Step 7: Fusion

The critical part — merging results from all three methods:

def fuse_results(bm25_results, vector_results, graph_results,
                 bm25_weight=0.3, vector_weight=0.5, graph_weight=0.8):
    all_results = {}

    for r in bm25_results:
        key = r["path"] + ":" + r["text"][:100]
        all_results[key] = {**r, "score": r["score"] * bm25_weight}

    for r in vector_results:
        key = r["path"] + ":" + r["text"][:100]
        if key in all_results:
            all_results[key]["score"] += r["score"] * vector_weight
        else:
            all_results[key] = {**r, "score": r["score"] * vector_weight}

    for r in graph_results:
        key = r["path"] + ":" + r["text"][:100]
        if key in all_results:
            all_results[key]["score"] += r["score"] * graph_weight
        else:
            all_results[key] = {**r, "score": r["score"] * graph_weight}

    return sorted(all_results.values(), key=lambda x: x["score"], reverse=True)

Weight tuning:

  • Graph results get highest weight — if the KG found a relevant entity, it's almost certainly right
  • Vector gets medium weight — good general recall
  • BM25 gets lowest weight — precise but narrow

Step 8: Deduplication and Reranking

After fusion:

  1. Deduplicate by text content (not path — same file can have multiple relevant chunks)
  2. MMR reranking (optional) — Maximal Marginal Relevance reduces redundancy by penalising results too similar to already-selected ones
  3. Score threshold — drop anything below 0.3 (tune this for your data)

Common Mistakes

  1. Using only vector search — Misses exact matches. "Port 8034" won't match semantically.
  2. Fixed-size chunking — Splitting mid-sentence destroys context.
  3. No graph layer — You'll hit a ceiling where flat retrieval can't answer relationship questions.
  4. Reranking with the same model — If you rerank with the same embeddings you searched with, you're just re-sorting the same biases.
  5. Ignoring BM25 — It's the fastest layer and catches what vectors miss. Always include it.

When to Add Complexity

If you have...You need...
< 1K chunksBM25 only (SQLite FTS5)
1K - 50K chunksBM25 + Vector
50K+ chunksBM25 + Vector + Graph
Multiple data sources (chats, emails, docs)Separate collections with routing
Real-time requirementsParallel search with timeouts

Output

Help the user:

  1. Assess their data volume and types
  2. Choose appropriate layers (BM25, vector, graph)
  3. Select embedding model and storage backend
  4. Design their chunking strategy
  5. Implement fusion with appropriate weights
  6. Set up a simple evaluation (test queries → expected results)

Further Reading

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Batch Content Factory

Multi-platform content production line. Automates the entire workflow from topic research to content creation. Suitable for self-media operators producing hi...

Registry SourceRecently Updated
Research

Fund Analyzer Pro

[何时使用]当用户需要基金深度分析时;当用户说"分析这个基金""基金对比""基金诊断""基金经理分析"时;当检测到基金代码/基金名称/投顾策略时触发。整合天天基金 API+ 且慢 MCP,提供单一基金分析/基金比较/基金诊断/持仓诊断/基金经理/机会分析/投资方式/报告信号八大模块。新增信号监控提醒功能(sign...

Registry SourceRecently Updated
Research

FN Portrait Toolkit

Financial report footnote extraction and analysis tool for Chinese A-share listed companies. Use when: (1) User wants to extract financial note data from ann...

Registry SourceRecently Updated
Research

流式AI检索问答技能

通用流式AI检索问答技能 — 为任意行业应用提供四步流式分析交互界面。 触发场景:用户输入关键词 → AI自动执行:理解意图 → 检索知识库 → 流式生成 → 来源标记 → 完整回答。 当需要实现以下任意场景时激活: (1) AI搜索框 / 智能咨询组件重构 (2) 知识库问答(医疗/法律/金融/教育等垂直领域)...

Registry SourceRecently Updated