RAG Patterns
Standard RAG Pipeline
Documents → Chunk → Embed → Store (vector DB) Query → Embed → Retrieve → Augment prompt → Generate answer
Chunking Strategies
from langchain_text_splitters import RecursiveCharacterTextSplitter
Recommended defaults
splitter = RecursiveCharacterTextSplitter( chunk_size=800, # chars (not tokens) chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""], ) chunks = splitter.split_documents(docs)
Strategy Best For Chunk Size
Fixed-size with overlap General text 500-1000 chars
Recursive character Structured docs 500-1000 chars
Semantic (by meaning) Long-form content Variable
Document-aware (markdown headers) Technical docs Section-based
Metadata Enrichment
for chunk in chunks: chunk.metadata.update({ "source": doc.metadata["source"], "section": extract_section_title(chunk), "doc_id": doc.metadata["id"], "chunk_index": i, })
Retrieval Strategies
Hybrid Search (keyword + semantic)
from langchain.retrievers import EnsembleRetriever from langchain_community.retrievers import BM25Retriever
bm25 = BM25Retriever.from_documents(docs, k=5) vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
hybrid = EnsembleRetriever( retrievers=[bm25, vector_retriever], weights=[0.3, 0.7], )
Re-ranking
from cohere import Client
cohere = Client(api_key=COHERE_API_KEY)
def rerank(query: str, documents: list[str], top_n: int = 5): response = cohere.rerank( model="rerank-english-v3.0", query=query, documents=documents, top_n=top_n, ) return [documents[r.index] for r in response.results]
Multi-query Retrieval
Generate multiple query variations for better recall
prompt = """Generate 3 different versions of this question to retrieve relevant documents: {question}"""
queries = llm.invoke(prompt).split("\n") all_docs = set() for q in queries: all_docs.update(retriever.invoke(q))
Prompt Construction
SYSTEM_PROMPT = """Answer based only on the provided context. If the context doesn't contain the answer, say "I don't have enough information." Cite sources using [Source: filename] format.
Context: {context}"""
def format_context(docs, max_tokens=3000): context_parts = [] for doc in docs: source = doc.metadata.get("source", "unknown") context_parts.append(f"[Source: {source}]\n{doc.page_content}") return "\n\n---\n\n".join(context_parts)
Evaluation
Metric Measures Tool
Context Relevance Are retrieved docs relevant? RAGAS, manual
Faithfulness Does answer match context? RAGAS
Answer Relevance Does answer address question? RAGAS
Retrieval Recall Are correct docs retrieved? Custom eval set
from ragas import evaluate from ragas.metrics import faithfulness, answer_relevancy, context_precision
result = evaluate(dataset, metrics=[faithfulness, answer_relevancy, context_precision])
Anti-Patterns
Anti-Pattern Fix
Chunks too large (>1500 chars) Use 500-1000 char chunks with 200 overlap
No metadata on chunks Store source, section, page number
No retrieval evaluation Build eval set, measure recall and precision
Stuffing all chunks in prompt Limit to top-K (3-5), use re-ranking
Ignoring hybrid search Combine BM25 + vector for better recall
No citation/source tracking Pass metadata through pipeline
Production Checklist
-
Chunking strategy tuned with eval set
-
Hybrid search (BM25 + vector) enabled
-
Re-ranking on retrieval results
-
Source attribution in answers
-
Guardrails for out-of-scope questions
-
Monitoring: retrieval latency, answer quality scores
-
Incremental indexing for new documents