rag-retrieval

Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rag-retrieval" with this command: npx skills add yonatangross/orchestkit/yonatangross-orchestkit-rag-retrieval

RAG Retrieval

Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

Category Rules Impact When to Use

Core RAG 4 CRITICAL Basic RAG, citations, hybrid search, context management

Embeddings 3 HIGH Model selection, chunking, batch/cache optimization

Contextual Retrieval 3 HIGH Context-prepending, hybrid BM25+vector, pipeline

HyDE 3 HIGH Vocabulary mismatch, hypothetical document generation

Agentic RAG 4 HIGH Self-RAG, CRAG, knowledge graphs, adaptive routing

Multimodal RAG 3 MEDIUM Image+text retrieval, PDF chunking, cross-modal search

Query Decomposition 3 MEDIUM Multi-concept queries, parallel retrieval, RRF fusion

Reranking 3 MEDIUM Cross-encoder, LLM scoring, combined signals

PGVector 4 HIGH PostgreSQL hybrid search, HNSW indexes, schema design

Total: 30 rules across 9 categories

Core RAG

Fundamental patterns for retrieval, generation, and pipeline composition.

Rule File Key Pattern

Basic RAG rules/core-basic-rag.md

Retrieve + context + generate with citations

Hybrid Search rules/core-hybrid-search.md

RRF fusion (k=60) for semantic + keyword

Context Management rules/core-context-management.md

Token budgeting + sufficiency check

Pipeline Composition rules/core-pipeline-composition.md

Composable Decompose → HyDE → Retrieve → Rerank

Embeddings

Embedding models, chunking strategies, and production optimization.

Rule File Key Pattern

Models & API rules/embeddings-models.md

Model selection, batch API, similarity

Chunking rules/embeddings-chunking.md

Semantic boundary splitting, 512 token sweet spot

Advanced rules/embeddings-advanced.md

Redis cache, Matryoshka dims, batch processing

Contextual Retrieval

Anthropic's context-prepending technique — 67% fewer retrieval failures.

Rule File Key Pattern

Context Prepending rules/contextual-prepend.md

LLM-generated context + prompt caching

Hybrid Search rules/contextual-hybrid.md

40% BM25 / 60% vector weight split

Complete Pipeline rules/contextual-pipeline.md

End-to-end indexing + hybrid retrieval

HyDE

Hypothetical Document Embeddings for bridging vocabulary gaps.

Rule File Key Pattern

Generation rules/hyde-generation.md

Embed hypothetical doc, not query

Per-Concept rules/hyde-per-concept.md

Parallel HyDE for multi-topic queries

Fallback rules/hyde-fallback.md

2-3s timeout → direct embedding fallback

Agentic RAG

Self-correcting retrieval with LLM-driven decision making.

Rule File Key Pattern

Self-RAG rules/agentic-self-rag.md

Binary document grading for relevance

Corrective RAG rules/agentic-corrective-rag.md

CRAG workflow with web fallback

Knowledge Graph rules/agentic-knowledge-graph.md

KG + vector hybrid for entity-rich domains

Adaptive Retrieval rules/agentic-adaptive-retrieval.md

Query routing to optimal strategy

Multimodal RAG

Image + text retrieval with cross-modal search.

Rule File Key Pattern

Embeddings rules/multimodal-embeddings.md

CLIP, SigLIP 2, Voyage multimodal-3

Chunking rules/multimodal-chunking.md

PDF extraction preserving images

Pipeline rules/multimodal-pipeline.md

Dedup + hybrid retrieval + generation

Query Decomposition

Breaking complex queries into concepts for parallel retrieval.

Rule File Key Pattern

Detection rules/query-detection.md

Heuristic indicators (<1ms fast path)

Decompose + RRF rules/query-decompose.md

LLM concept extraction + parallel retrieval

HyDE Combo rules/query-hyde-combo.md

Decompose + HyDE for maximum coverage

Reranking

Post-retrieval re-scoring for higher precision.

Rule File Key Pattern

Cross-Encoder rules/reranking-cross-encoder.md

ms-marco-MiniLM (~50ms, free)

LLM Reranking rules/reranking-llm.md

Batch scoring + Cohere API

Combined rules/reranking-combined.md

Multi-signal weighted scoring

PGVector

Production hybrid search with PostgreSQL.

Rule File Key Pattern

Schema rules/pgvector-schema.md

HNSW index + pre-computed tsvector

Hybrid Search rules/pgvector-hybrid-search.md

SQLAlchemy RRF with FULL OUTER JOIN

Indexing rules/pgvector-indexing.md

HNSW (17x faster) vs IVFFlat

Metadata rules/pgvector-metadata.md

Filtering, boosting, Redis 8 comparison

Quick Start Example

from openai import OpenAI

client = OpenAI()

async def rag_query(question: str, top_k: int = 5) -> dict: """Basic RAG with citations.""" docs = await vector_db.search(question, limit=top_k) context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])

response = await llm.chat([
    {"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
    {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])

return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}

Key Decisions

Decision Recommendation

Embedding model text-embedding-3-small (general), voyage-3 (production)

Chunk size 256-1024 tokens (512 typical)

Hybrid weight 40% BM25 / 60% vector

Top-k 3-10 documents

Temperature 0.1-0.3 (factual)

Context budget 4K-8K tokens

Reranking Retrieve 50, rerank to 10

Vector index HNSW (production), IVFFlat (high-volume)

HyDE timeout 2-3 seconds with fallback

Query decomposition Heuristic first, LLM only if multi-concept

Common Mistakes

  • No citation tracking (unverifiable answers)

  • Context too large (dilutes relevance)

  • Single retrieval method (misses keyword matches)

  • Not chunking long documents (context gets lost)

  • Embedding queries differently than documents

  • No fallback path in agentic RAG (workflow hangs)

  • Infinite rewrite loops (no retry limit)

  • Using wrong similarity metric (cosine vs euclidean)

  • Not caching embeddings (recomputing unchanged content)

  • Missing image captions in multimodal RAG (limits text search)

Evaluations

See test-cases.json for 30 test cases across all categories.

Related Skills

  • ork:langgraph

  • LangGraph workflow patterns (for agentic RAG workflows)

  • caching

  • Cache RAG responses for repeated queries

  • ork:golden-dataset

  • Evaluate retrieval quality

  • ork:llm-integration

  • Local embeddings with nomic-embed-text

  • vision-language-models

  • Image analysis for multimodal RAG

  • ork:database-patterns

  • Schema design for vector search

Capability Details

retrieval-patterns

Keywords: retrieval, context, chunks, relevance, rag Solves:

  • Retrieve relevant context for LLM

  • Implement RAG pipeline with citations

  • Optimize retrieval quality

hybrid-search

Keywords: hybrid, bm25, vector, fusion, rrf Solves:

  • Combine keyword and semantic search

  • Implement reciprocal rank fusion

  • Balance precision and recall

embeddings

Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:

  • Convert text to vector embeddings

  • Choose embedding models and dimensions

  • Implement chunking strategies

contextual-retrieval

Keywords: contextual, anthropic, context-prepend, bm25 Solves:

  • Prepend context to chunks for better retrieval

  • Reduce retrieval failures by 67%

  • Implement hybrid BM25+vector search

hyde

Keywords: hyde, hypothetical, vocabulary mismatch Solves:

  • Bridge vocabulary gaps in semantic search

  • Generate hypothetical documents for embedding

  • Handle abstract or conceptual queries

agentic-rag

Keywords: self-rag, crag, corrective, adaptive, grading Solves:

  • Build self-correcting RAG workflows

  • Grade document relevance

  • Implement web search fallback

multimodal-rag

Keywords: multimodal, image, clip, vision, pdf Solves:

  • Build RAG with images and text

  • Cross-modal search (text → image)

  • Process PDFs with mixed content

query-decomposition

Keywords: decompose, multi-concept, complex query Solves:

  • Break complex queries into concepts

  • Parallel retrieval per concept

  • Improve coverage for compound questions

reranking

Keywords: rerank, cross-encoder, precision, scoring Solves:

  • Improve search precision post-retrieval

  • Score relevance with cross-encoder or LLM

  • Combine multiple scoring signals

pgvector-search

Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves:

  • Production hybrid search with PostgreSQL

  • HNSW vs IVFFlat index selection

  • SQL-based RRF fusion

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

responsive-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

domain-driven-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

dashboard-patterns

No summary provided by upstream source.

Repository SourceNeeds Review