ai-rag

RAG & Search Engineering — Complete Reference

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ai-rag" with this command: npx skills add vasilyu1983/ai-agents-public/vasilyu1983-ai-agents-public-ai-rag

RAG & Search Engineering — Complete Reference

Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.

This skill covers:

  • RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems

  • Search: BM25, vector search, hybrid fusion, ranking pipelines

  • Evaluation: recall@k, nDCG, MRR, groundedness metrics

Modern Best Practices (Jan 2026):

Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.

Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.

Quick Reference

Task Tool/Framework Command/Pattern When to Use

Decide RAG vs alternatives Decision framework RAG if: freshness + citations + corpus size; else: fine-tune/caching Avoid unnecessary retrieval latency/complexity

Chunking & parsing Chunker + parser Start simple; add structure-aware chunking per doc type Ingestion for docs, code, tables, PDFs

Retrieval Sparse + dense (hybrid) Fusion (e.g., RRF) + metadata filters + top-k tuning Mixed query styles; high recall requirements

Precision boost Reranker Cross-encoder/LLM rerank of top-k candidates When top-k contains near-misses/noise

Grounding Output contract + citations Quote/ID citations; answerability gate; refuse on missing evidence Compliance, trust, and auditability

Evaluation Offline + online eval Retrieval metrics + answer metrics + regression tests Prevent silent regressions and staleness failures

Decision Tree: RAG Architecture Selection

Building RAG system: [Architecture Path] ├─ Document type? │ ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata) │ ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers) │ └─ Simple content? → Fixed-size token chunking with overlap (baseline) │ ├─ Retrieval accuracy low? │ ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters │ ├─ Noisy results? → Add reranker + better metadata filters │ └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking │ ├─ Dataset size? │ ├─ <100k chunks? → Flat index (exact search) │ ├─ 100k-10M? → HNSW (low latency) │ └─ >10M? → IVF/ScaNN/DiskANN (scalable) │ └─ Production quality? └─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)

Core Concepts (Vendor-Agnostic)

  • Pipeline stages: ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify.

  • Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).

  • Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).

  • Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Implementation Practices (Tooling Examples)

  • Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.

  • Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).

  • Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).

Do / Avoid

Do

  • Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.

  • Do enforce document-level ACLs at retrieval time (not only at generation time).

  • Do include citations with stable IDs and verify citation coverage in tests.

Avoid

  • Avoid shipping RAG without a test set and regression gate.

  • Avoid "stuff everything" context packing; it increases cost and can reduce accuracy.

  • Avoid mixing corpora without metadata and tenant isolation.

When to Use This Skill

Use this skill when the user asks:

  • "Help me design a RAG pipeline."

  • "How should I chunk this document?"

  • "Optimize retrieval for my use case."

  • "My RAG system is hallucinating — fix it."

  • "Choose the right vector database / index type."

  • "Create a RAG evaluation framework."

  • "Debug why retrieval gives irrelevant results."

Tool/Model Recommendation Protocol

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.

Triggers

  • "What's the best vector database for [use case]?"

  • "What should I use for [chunking/embedding/reranking]?"

  • "What's the latest in RAG development?"

  • "Current best practices for [retrieval/grounding/evaluation]?"

  • "Is [Pinecone/Qdrant/Chroma] still relevant in 2026?"

  • "[Vector DB A] vs [Vector DB B]?"

  • "Best embedding model for [use case]?"

  • "What RAG framework should I use?"

Required Checks

  • Read data/sources.json and start from sources with "add_as_web_search": true .

  • Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).

  • If browsing isn't available, state assumptions and give a verification checklist.

What to Report

After checking, provide:

  • Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)

  • Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)

  • Deprecated/declining: Approaches or tools losing relevance

  • Recommendation: Based on fresh data, not just static knowledge

Example Topics (verify with current sources)

  • Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)

  • Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)

  • Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)

  • RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)

  • Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)

  • Evaluation (RAGAS, TruLens, DeepEval, BEIR)

Related Skills

For adjacent topics, reference these skills:

  • ai-llm - Prompting, fine-tuning, instruction datasets

  • ai-agents - Agentic RAG workflows and tool routing

  • ai-llm-inference - Serving performance, quantization, batching

  • ai-mlops - Deployment, monitoring, security, privacy, and governance

  • ai-prompt-engineering - Prompt patterns for RAG generation phase

Templates

System Design (Start Here)

  • RAG System Design

Chunking & Ingestion

  • Basic Chunking

  • Code Chunking

  • Long Document Chunking

Embedding & Indexing

  • Index Configuration

  • Metadata Schema

Retrieval & Reranking

  • Retrieval Pipeline

  • Hybrid Search

  • Reranking

  • Ranking Pipeline

  • Reranker

Context Packaging & Grounding

  • Context Packing

  • Grounding

Evaluation

  • RAG Evaluation

  • RAG Test Set

  • Search Evaluation

  • Search Test Set

Search Configuration

  • BM25 Configuration

  • HNSW Configuration

  • IVF Configuration

  • Hybrid Configuration

Query Rewriting

  • Query Rewrite

Navigation

Resources

  • references/advanced-rag-patterns.md

  • references/agentic-rag-patterns.md

  • references/bm25-tuning.md

  • references/chunking-patterns.md

  • references/chunking-strategies.md

  • references/rag-evaluation-guide.md

  • references/rag-troubleshooting.md

  • references/contextual-retrieval-guide.md

  • references/distributed-search-slos.md

  • references/grounding-checklists.md

  • references/hybrid-fusion-patterns.md

  • references/index-selection-guide.md

  • references/multilingual-domain-patterns.md

  • references/pipeline-architecture.md

  • references/query-rewriting-patterns.md

  • references/ranking-pipeline-guide.md

  • references/retrieval-patterns.md

  • references/search-debugging.md

  • references/search-evaluation-guide.md

  • references/user-feedback-learning.md

  • references/vector-search-patterns.md

  • references/graph-rag-patterns.md

  • references/embedding-model-guide.md

  • references/rag-caching-patterns.md

Templates

  • assets/context/template-context-packing.md

  • assets/context/template-grounding.md

  • assets/design/rag-system-design.md

  • assets/chunking/template-basic-chunking.md

  • assets/chunking/template-code-chunking.md

  • assets/chunking/template-long-doc-chunking.md

  • assets/retrieval/template-retrieval-pipeline.md

  • assets/retrieval/template-hybrid-search.md

  • assets/retrieval/template-reranking.md

  • assets/eval/template-rag-eval.md

  • assets/eval/template-rag-testset.jsonl

  • assets/eval/template-search-eval.md

  • assets/eval/template-search-testset.jsonl

  • assets/indexing/template-index-config.md

  • assets/indexing/template-metadata-schema.md

  • assets/query/template-query-rewrite.md

  • assets/ranking/template-ranking-pipeline.md

  • assets/ranking/template-reranker.md

  • assets/search/template-bm25-config.md

  • assets/search/template-hnsw-config.md

  • assets/search/template-ivf-config.md

  • assets/search/template-hybrid-config.md

Data

  • data/sources.json — Curated external references

Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.

Fact-Checking

  • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.

  • Prefer primary sources; report source links and dates for volatile information.

  • If web access is unavailable, state the limitation and mark guidance as unverified.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

product-management

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

marketing-visual-design

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

startup-idea-validation

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

software-architecture-design

No summary provided by upstream source.

Repository SourceNeeds Review