pgvector & RAG

Complete toolkit for building vector search and RAG (Retrieval-Augmented Generation) applications with PostgreSQL. Covers pgvector schema setup, embedding generation with Ollama, similarity search, full RAG pipelines, and agentic-kit integration.

When to Apply

Use this skill when:

Setting up pgvector: Creating tables, indexes, vector storage schema
Generating embeddings: Using Ollama to embed documents and chunks
Similarity search: Querying vectors with cosine/L2/inner product distance
Building RAG: Combining retrieval with LLM generation
Ollama integration: Local LLM inference, model management, streaming
Agentic-kit RAG: Wiring RAG into agentic-kit chat applications

Architecture

Document → Chunking → Embedding → pgvector Storage
                                        ↓
Query → Embedding → Similarity Search → Context Retrieval → LLM Response

Quick Start

# 1. Start PostgreSQL with pgvector
pgpm docker start
eval "$(pgpm env)"

# 2. Pull Ollama models
ollama pull nomic-embed-text
ollama pull llama3.2

# 3. Create vector storage module
pgpm init my-vectors
cd my-vectors
pgpm add schemas/intelligence
pgpm add schemas/intelligence/tables/documents --requires schemas/intelligence
pgpm add schemas/intelligence/tables/chunks --requires schemas/intelligence/tables/documents

Schema Design

Documents Table

CREATE TABLE intelligence.documents (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT NOT NULL,
    metadata JSONB DEFAULT '{}'::jsonb,
    embedding VECTOR(768),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Chunks Table

CREATE TABLE intelligence.chunks (
    id SERIAL PRIMARY KEY,
    document_id INTEGER NOT NULL REFERENCES intelligence.documents(id) ON DELETE CASCADE,
    content TEXT NOT NULL,
    embedding VECTOR(768),
    chunk_index INTEGER NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_chunks_document_id ON intelligence.chunks(document_id);

Similarity Search Function

CREATE FUNCTION intelligence.find_similar_chunks(
    p_embedding VECTOR(768),
    p_limit INTEGER DEFAULT 5,
    p_similarity_threshold FLOAT DEFAULT 0.7
)
RETURNS TABLE (
    id INTEGER,
    content TEXT,
    similarity FLOAT
) AS $$
BEGIN
    RETURN QUERY
    SELECT
        c.id,
        c.content,
        1 - (c.embedding <=> p_embedding) AS similarity
    FROM intelligence.chunks c
    WHERE c.embedding IS NOT NULL
      AND 1 - (c.embedding <=> p_embedding) > p_similarity_threshold
    ORDER BY c.embedding <=> p_embedding
    LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;

Embedding Models

Model	Dimensions	Speed	Quality
`nomic-embed-text`	768	Fast	Good
`mxbai-embed-large`	1024	Medium	Better
`all-minilm`	384	Very Fast	Acceptable

Distance Operators

Operator	Type	Use Case
`<=>`	Cosine	Most common, normalized vectors
`<->`	Euclidean (L2)	When magnitude matters
`<#>`	Inner product	Dot product similarity

TypeScript: OllamaClient

import fetch from 'cross-fetch';

export class OllamaClient {
  private baseUrl: string;

  constructor(baseUrl?: string) {
    this.baseUrl = baseUrl || process.env.OLLAMA_HOST || 'http://localhost:11434';
  }

  async generateEmbedding(text: string, model = 'nomic-embed-text'): Promise<number[]> {
    const response = await fetch(`${this.baseUrl}/api/embeddings`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model, prompt: text }),
    });
    if (!response.ok) throw new Error(`Embedding failed: ${response.statusText}`);
    const data = await response.json();
    return data.embedding;
  }

  async generateResponse(prompt: string, context?: string, model = 'mistral'): Promise<string> {
    const fullPrompt = context
      ? `Context: ${context}\n\nQuestion: ${prompt}\n\nAnswer:`
      : prompt;
    const response = await fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model, prompt: fullPrompt, stream: false }),
    });
    if (!response.ok) throw new Error(`Generation failed: ${response.statusText}`);
    const data = await response.json();
    return data.response;
  }
}

Vector Format

pgvector expects bracket notation:

const formatVector = (embedding: number[]): string => `[${embedding.join(',')}]`;

RAG Query Pattern

// 1. Embed the question
const queryEmbedding = await ollama.generateEmbedding(question);

// 2. Retrieve context
const result = await pool.query(
  `SELECT string_agg(content, E'\n\n') as context
   FROM intelligence.find_similar_chunks($1::vector, $2)`,
  [formatVector(queryEmbedding), 5]
);

// 3. Generate response with context
const response = await ollama.generateResponse(question, result.rows[0].context);

Indexes for Performance

-- IVFFlat (good balance, add after initial data load)
CREATE INDEX idx_chunks_embedding ON intelligence.chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- HNSW (better recall, more memory)
CREATE INDEX idx_chunks_embedding_hnsw ON intelligence.chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`RAG_DATABASE_URL`	-	PostgreSQL connection string
`RAG_EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model
`RAG_CHAT_MODEL`	`llama3.2`	Chat model
`RAG_SIMILARITY_THRESHOLD`	`0.5`	Minimum similarity score
`RAG_CONTEXT_LIMIT`	`5`	Max chunks to retrieve

Troubleshooting Quick Reference

Issue	Quick Fix
"type vector does not exist"	pgvector extension not installed; use pgvector-enabled Docker image
"Connection refused" to Ollama	Start Ollama: `ollama serve`
"Model not found"	Pull model: `ollama pull <model>`
Dimension mismatch	Ensure VECTOR(n) matches model output dimensions
No results returned	Lower similarity threshold
Slow queries	Add IVFFlat or HNSW index

Reference Guide

Consult these reference files for detailed documentation on specific topics:

Reference	Topic	Consult When
references/setup.md	pgvector schema setup	Creating tables, indexes, vector dimensions, pgpm module structure
references/embeddings.md	Generating and storing embeddings	OllamaClient, document chunking, ingestion pipeline, batch processing
references/similarity-search.md	Similarity search queries	Distance operators, thresholds, metadata filtering, performance tuning
references/rag-pipeline.md	Complete RAG pipeline	RAGService implementation, streaming, chat history, prompt engineering
references/ollama.md	Ollama integration	Installation, API endpoints, model selection, chat API, CI/CD setup
references/agentic-kit.md	Agentic-kit RAG	RAGProvider, createRAGKit, useAgent hook, environment config, database schema

Cross-References

Related skills (separate from this skill):

graphile-pgvector — Integrate pgvector with PostGraphile v5 GraphQL
pgpm (references/docker.md) — PostgreSQL container management for pgvector
github-workflows-ollama — GitHub Actions for Ollama and pgvector testing

pgvector-rag

Safety Notice

Copy this and send it to your AI assistant to learn