agent:rag

Guides the user through designing a Retrieval-Augmented Generation (RAG) pipeline. Based on "Principles of Building AI Agents" (Bhagwat & Gienow, 2025), Part V: RAG (Chapters 17-20).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent:rag" with this command: npx skills add ikatsuba/skills/ikatsuba-skills-agent-rag

RAG Pipeline Design

Guides the user through designing a Retrieval-Augmented Generation (RAG) pipeline. Based on "Principles of Building AI Agents" (Bhagwat & Gienow, 2025), Part V: RAG (Chapters 17-20).

When to use

Use this skill when the user needs to:

  • Design a RAG pipeline for an agent

  • Choose a vector database

  • Configure chunking, embedding, and retrieval

  • Evaluate whether RAG is even needed (vs. alternatives)

  • Tune an existing RAG pipeline for better quality

Instructions

Step 1: Do You Actually Need RAG?

Before building a pipeline, apply the principle: Start simple, check quality, get complex.

Use AskUserQuestion to assess:

RAG Decision Tree

Step 1: How large is your corpus?

  • < 200 pages → Try full context loading first (Gemini 2M, Claude 200K)
  • 200-10,000 pages → Consider agentic RAG (tools that query data) OR traditional RAG
  • > 10,000 pages → Traditional RAG pipeline is likely needed

Step 2: What is the query pattern?

  • Factual lookup ("What is X?") → RAG works well
  • Analytical ("Compare X and Y across documents") → Agentic RAG may be better
  • Conversational ("Tell me about...") → Either works

Step 3: How structured is the data?

  • Highly structured (tables, databases) → Use tools/APIs, not RAG
  • Semi-structured (markdown, HTML) → RAG with format-specific chunking
  • Unstructured (PDFs, free text) → Traditional RAG

Recommended progression:

  • First, load entire corpus into a large context window

  • Second, write functions to query the dataset, give to agent as tools

  • Only if 1 and 2 fail on quality, build a RAG pipeline

If the user decides RAG is needed, proceed. Otherwise, recommend the simpler alternative.

Step 2: Chunking Strategy

Design how documents are split into retrievable pieces:

Chunking Strategy

Method

StrategyBest ForDescription
RecursiveGeneral textSplits by paragraph, then sentence, then character
Token-awareLLM optimizationSplits by token count, respects model limits
Format-specificMarkdown/HTML/JSONUses document structure (headers, tags, keys)
SemanticHigh quality needsUses LLM to identify natural topic boundaries

Selected: [Strategy]

Parameters

ParameterValueRationale
Chunk size[256-1024 tokens]Balance: smaller = more precise, larger = more context
Overlap[50-200 tokens]Prevents losing context at chunk boundaries
Metadata[title, source, date, section, page]Enables filtered retrieval

Document-Specific Rules

Document TypeChunking Rule
[Markdown docs]Split on ## headers, keep header as metadata
[PDFs]Page-based with overlap, extract title/section
[Code files]Function/class-level chunks
[Chat logs]Message groups of [N] turns

Step 3: Embedding Configuration

Choose how chunks become vectors:

Embedding

Model Selection

ModelDimensionsQualityCostSpeed
OpenAI text-embedding-3-large3072High$0.13/M tokensFast
OpenAI text-embedding-3-small1536Good$0.02/M tokensFast
Voyage voyage-31024High$0.06/M tokensFast
Cohere embed-v31024High$0.10/M tokensFast
Local (e5-large, BGE)1024GoodFree (compute)Varies

Selected: [Model]

Indexing

ParameterValue
Dimensions[From model]
Similarity metricCosine (most common)
Index typeHNSW (default, good balance of speed/accuracy)

Step 4: Vector Database Selection

Apply the principle: Prevent infra sprawl — vector DB choice is mostly commoditized.

Use AskUserQuestion :

Vector Database

Decision Matrix

OptionWhen to ChooseProsCons
pgvector (Postgres extension)Already using PostgresNo new infra, familiar SQL, metadata filteringMay need tuning at scale
Pinecone (managed)New project, want simplicityFully managed, fast, scalableAdditional service + cost
Chroma (open-source)Local dev, small scaleFree, easy setupSelf-host in production
Cloud-native (Cloudflare, DataStax)Already on that cloudIntegrated billing, low latencyVendor lock-in

Selected: [Database] Rationale: [Why]

Step 5: Retrieval Configuration

Design how the agent queries the vector store:

Retrieval

Query Strategy

ParameterValueRationale
topK[3-10]Number of chunks to retrieve
similarityThreshold[0.7-0.9]Min relevance to include
reranking[Yes/No]Post-retrieval quality boost

Hybrid Queries

Combine vector similarity with metadata filters:

FilterTypeExample
Date rangeMetadataOnly docs from last 30 days
CategoryMetadataOnly "technical" documents
SourceMetadataOnly from "docs.example.com"
User accessMetadataOnly docs user has permission to see

Reranking (Optional)

  • When to use: Quality matters more than latency
  • How: Retrieve topK * 3 candidates, rerank with a cross-encoder, return topK
  • Models: Cohere Rerank, bge-reranker, cross-encoder/ms-marco
  • Cost: More expensive per query, but runs only on candidates (not full corpus)

Query Transformation (Optional)

  • HyDE: Generate a hypothetical answer, use it as the search query
  • Multi-query: Generate multiple query variations, merge results
  • Step-back: Abstract the query to a higher level, then search

Step 6: Pipeline Architecture

Bring it all together:

RAG Pipeline

Ingestion Pipeline

  1. Load documents from [source]
  2. Chunk using [strategy] with [size] tokens, [overlap] overlap
  3. Enrich metadata: source, date, category, section
  4. Embed using [model]
  5. Upsert into [vector DB]
  6. Schedule: [On change / Nightly / Manual]

Query Pipeline

  1. Receive user query
  2. Transform query (optional: HyDE, multi-query)
  3. Embed query using [same model as ingestion]
  4. Search vector DB: topK=[N], filters=[metadata filters]
  5. Rerank results (optional)
  6. Inject top chunks into LLM context as <retrieved_documents>
  7. Generate response with source attribution

Architecture Diagram

graph LR subgraph Ingestion Docs[Documents] --> Chunk[Chunker] Chunk --> Embed[Embedder] Embed --> Store[(Vector DB)] end subgraph Query User[User Query] --> QEmbed[Query Embedder] QEmbed --> Search[Similarity Search] Store --> Search Search --> Rerank[Reranker] Rerank --> LLM[LLM + Context] LLM --> Response[Response] end

Step 7: Quality Checklist

RAG Quality Checklist

Retrieval Quality

  • Relevant documents consistently in top-K results
  • Metadata filters working correctly
  • No duplicate chunks in results
  • Chunk size balances precision vs. context

Generation Quality

  • Responses are grounded in retrieved documents
  • Source attribution is accurate
  • Agent says "I don't know" when no relevant chunks found
  • No hallucination beyond retrieved context

Operational

  • Ingestion pipeline runs on schedule
  • New documents are available within [SLA]
  • Vector DB latency < [target]ms
  • Embedding costs within budget

Step 8: Summarize and Offer Next Steps

Present all findings to the user as a structured summary in the conversation (including the pipeline diagram). Do NOT write to .specs/ — this skill works directly.

Use AskUserQuestion to offer:

  • Implement pipeline — scaffold ingestion and query code

  • Skip RAG — if the decision tree said RAG isn't needed, help with the alternative (full context or agentic tools)

  • Comprehensive design — run agent:design to cover all areas with a spec

Arguments

  • <args>
  • Optional description of the knowledge domain or path to existing RAG code

Examples:

  • agent:rag documentation search — design RAG for a docs search agent

  • agent:rag src/rag/ — review and tune existing RAG pipeline

  • agent:rag — start fresh

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

agent:design

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent:prompt

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent:secure

No summary provided by upstream source.

Repository SourceNeeds Review