Agentic Kit RAG Integration

Configure the agentic-kit for Retrieval-Augmented Generation (RAG) using pgvector for semantic search and PGPM for database management. This skill enables AI chat applications to provide contextual responses based on a knowledge base stored in PostgreSQL.

When to Apply

Use this skill when:

Adding RAG capabilities to agentic-kit applications
Configuring environment variables for RAG-enabled AI chat
Setting up a local database for document storage and retrieval
Building AI assistants that need access to a knowledge base
Integrating pgvector with the sf-rag-utils app

Architecture Overview

User Query → Agentic Kit → RAG Provider ↓ [RAG_ENABLED=true?] ↓ ↓ Yes No ↓ ↓ Query Embedding Direct Ollama ↓ pgvector Search ↓ Context Retrieval ↓ Ollama + Context → Response

Environment Variables

Configure RAG behavior through environment variables:

Variable Default Description

RAG_ENABLED

true

Enable RAG context retrieval (set to false for direct Ollama)

RAG_DATABASE_URL

PostgreSQL connection string (required when RAG_ENABLED=true)

OLLAMA_HOST

http://localhost:11434

Ollama server URL

RAG_EMBEDDING_MODEL

nomic-embed-text

Model for generating embeddings

RAG_CHAT_MODEL

llama3.2

Model for chat responses

RAG_SIMILARITY_THRESHOLD

0.5

Minimum similarity score for retrieval

RAG_CONTEXT_LIMIT

Maximum number of chunks to retrieve

RAG_SCHEMA

intelligence

PostgreSQL schema for RAG tables

Example .env File

RAG Configuration

RAG_ENABLED=true RAG_DATABASE_URL=postgres://postgres:postgres@localhost:5432/rag_dev

Ollama Configuration

OLLAMA_HOST=http://localhost:11434 RAG_EMBEDDING_MODEL=nomic-embed-text RAG_CHAT_MODEL=llama3.2

Retrieval Settings

RAG_SIMILARITY_THRESHOLD=0.5 RAG_CONTEXT_LIMIT=5 RAG_SCHEMA=intelligence

Disabling RAG (Direct Ollama Mode)

To use direct Ollama without RAG context:

RAG_ENABLED=false OLLAMA_HOST=http://localhost:11434 RAG_CHAT_MODEL=llama3.2

Quick Start

Set Up Local Database with PGPM

Ensure PostgreSQL is running with a pgvector-enabled image (see pgpm-docker skill) and PG env vars are loaded (see pgpm-env skill).

Run the setup script

bash /mnt/skills/user/agentic-kit-rag/scripts/setup-rag-database.sh

Configure Environment

Create a .env file in your application:

RAG_ENABLED=true RAG_DATABASE_URL=postgres://postgres:postgres@localhost:5432/rag_dev OLLAMA_HOST=http://localhost:11434

Pull Required Ollama Models

ollama pull nomic-embed-text ollama pull llama3.2

RAG Provider Implementation

Create a RAG-aware provider for agentic-kit:

// src/lib/ai/rag-provider.ts import { Pool } from 'pg'; import type { AgentProvider, GenerateInput, StreamCallbacks, Message } from '@sf-ai/agentic-kit';

interface RAGConfig { databaseUrl: string; ollamaHost?: string; embeddingModel?: string; chatModel?: string; similarityThreshold?: number; contextLimit?: number; schema?: string; }

const formatVector = (embedding: number[]): string => [${embedding.join(',')}];

export class RAGProvider implements AgentProvider { readonly name = 'rag'; private pool: Pool; private ollamaHost: string; private embeddingModel: string; private chatModel: string; private similarityThreshold: number; private contextLimit: number; private schema: string;

constructor(config: RAGConfig) { this.pool = new Pool({ connectionString: config.databaseUrl }); this.ollamaHost = config.ollamaHost || 'http://localhost:11434'; this.embeddingModel = config.embeddingModel || 'nomic-embed-text'; this.chatModel = config.chatModel || 'llama3.2'; this.similarityThreshold = config.similarityThreshold || 0.5; this.contextLimit = config.contextLimit || 5; this.schema = config.schema || 'intelligence'; }

private async generateEmbedding(text: string): Promise<number[]> { const response = await fetch(${this.ollamaHost}/api/embeddings, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: this.embeddingModel, prompt: text }), });

if (!response.ok) {
  throw new Error(`Failed to generate embedding: ${response.statusText}`);
}

const data = await response.json();
return data.embedding;

}

private async retrieveContext(query: string): Promise<string> { const embedding = await this.generateEmbedding(query);

const result = await this.pool.query(
  `SELECT string_agg(content, E'\n\n') as context
   FROM ${this.schema}.find_similar_chunks($1::vector, $2, $3)`,
  [formatVector(embedding), this.contextLimit, this.similarityThreshold]
);

return result.rows[0]?.context || '';

}

async generate(input: GenerateInput): Promise<string> { const context = await this.retrieveContext(input.prompt);

const response = await fetch(`${this.ollamaHost}/api/chat`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: input.model || this.chatModel,
    messages: this.buildMessages(input.messages || [], input.prompt, context),
    stream: false,
  }),
});

if (!response.ok) {
  throw new Error(`Chat failed: ${response.statusText}`);
}

const data = await response.json();
return data.message.content;

}

async generateStream(input: GenerateInput, callbacks: StreamCallbacks): Promise<void> { callbacks.onStateChange?.('thinking');

const context = await this.retrieveContext(input.prompt);

const response = await fetch(`${this.ollamaHost}/api/chat`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: input.model || this.chatModel,
    messages: this.buildMessages(input.messages || [], input.prompt, context),
    stream: true,
  }),
});

if (!response.ok) {
  const error = new Error(`Chat stream failed: ${response.statusText}`);
  callbacks.onStateChange?.('error');
  callbacks.onError?.(error);
  throw error;
}

await this.processStream(response, callbacks);

}

private buildMessages( messages: Message[], prompt: string, context: string ): Array<{ role: string; content: string }> { const systemMessage = messages.find(m => m.role === 'system'); const conversationMessages = messages.filter(m => m.role !== 'system');

const systemContent = systemMessage?.content || 'You are a helpful assistant.';
const contextualSystem = context
  ? `${systemContent}\n\nUse the following context to answer questions. If the context doesn't contain relevant information, say so.\n\nContext:\n${context}`
  : systemContent;

return [
  { role: 'system', content: contextualSystem },
  ...conversationMessages.map(m => ({ role: m.role, content: m.content })),
  { role: 'user', content: prompt },
];

}

private async processStream(response: Response, callbacks: StreamCallbacks): Promise<void> { const reader = response.body?.getReader(); if (!reader) { throw new Error('Failed to get response reader'); }

callbacks.onStateChange?.('streaming');

const decoder = new TextDecoder();
let fullResponse = '';
let done = false;

while (!done) {
  const result = await reader.read();
  done = result.done;
  if (done) break;

  const chunk = decoder.decode(result.value);
  const lines = chunk.split('\n').filter(Boolean);

  for (const line of lines) {
    try {
      const data = JSON.parse(line);
      if (data.message?.content) {
        fullResponse += data.message.content;
        callbacks.onToken?.(data.message.content);
      }
    } catch {
      // Skip malformed JSON lines
    }
  }
}

callbacks.onStateChange?.('complete');
callbacks.onComplete?.(fullResponse);

}

async isAvailable(): Promise<boolean> { try { const [ollamaCheck, dbCheck] = await Promise.all([ fetch(${this.ollamaHost}/api/tags).then(r => r.ok), this.pool.query('SELECT 1').then(() => true).catch(() => false), ]); return ollamaCheck && dbCheck; } catch { return false; } }

async close(): Promise<void> { await this.pool.end(); } }

Creating RAG-Enabled Kit

// src/lib/ai/create-rag-kit.ts import { AgentKit, OllamaProvider } from '@sf-ai/agentic-kit'; import { RAGProvider } from './rag-provider';

interface RAGKitConfig { ragEnabled?: boolean; databaseUrl?: string; ollamaHost?: string; embeddingModel?: string; chatModel?: string; similarityThreshold?: number; contextLimit?: number; schema?: string; }

export function createRAGKit(config: RAGKitConfig = {}): AgentKit { const ragEnabled = config.ragEnabled ?? process.env.RAG_ENABLED !== 'false'; const databaseUrl = config.databaseUrl || process.env.RAG_DATABASE_URL; const ollamaHost = config.ollamaHost || process.env.OLLAMA_HOST || 'http://localhost:11434';

const kit = new AgentKit();

if (ragEnabled && databaseUrl) { kit.addProvider(new RAGProvider({ databaseUrl, ollamaHost, embeddingModel: config.embeddingModel || process.env.RAG_EMBEDDING_MODEL, chatModel: config.chatModel || process.env.RAG_CHAT_MODEL, similarityThreshold: config.similarityThreshold || parseFloat(process.env.RAG_SIMILARITY_THRESHOLD || '0.5'), contextLimit: config.contextLimit || parseInt(process.env.RAG_CONTEXT_LIMIT || '5', 10), schema: config.schema || process.env.RAG_SCHEMA, })); } else { kit.addProvider(new OllamaProvider({ baseUrl: ollamaHost, defaultModel: config.chatModel || process.env.RAG_CHAT_MODEL || 'llama3.2', })); }

return kit; }

Updating useAgent Hook

Modify the useAgent hook to support RAG:

// src/lib/ai/use-agent.ts 'use client';

import { useCallback, useRef, useState } from 'react'; import type { AgentState, Message, StreamCallbacks } from '@sf-ai/agentic-kit'; import { createRAGKit } from './create-rag-kit';

export interface UseAgentOptions { baseUrl?: string; model?: string; systemPrompt?: string; ragEnabled?: boolean; databaseUrl?: string; }

export function useAgent(options: UseAgentOptions = {}) { const { baseUrl, model, systemPrompt, ragEnabled, databaseUrl } = options;

const kitRef = useRef(createRAGKit({ ragEnabled, databaseUrl, ollamaHost: baseUrl, chatModel: model, }));

// ... rest of the hook implementation }

Database Schema

The RAG database requires the following schema (created by the setup script):

-- Schema CREATE SCHEMA IF NOT EXISTS intelligence;

-- Documents table CREATE TABLE intelligence.documents ( id SERIAL PRIMARY KEY, title TEXT, content TEXT NOT NULL, metadata JSONB DEFAULT '{}'::jsonb, embedding VECTOR(768), created_at TIMESTAMPTZ DEFAULT NOW() );

-- Chunks table CREATE TABLE intelligence.chunks ( id SERIAL PRIMARY KEY, document_id INTEGER NOT NULL REFERENCES intelligence.documents(id) ON DELETE CASCADE, content TEXT NOT NULL, embedding VECTOR(768), chunk_index INTEGER NOT NULL, created_at TIMESTAMPTZ DEFAULT NOW() );

CREATE INDEX idx_chunks_document_id ON intelligence.chunks(document_id); CREATE INDEX idx_chunks_embedding ON intelligence.chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Similarity search function CREATE FUNCTION intelligence.find_similar_chunks( p_embedding VECTOR(768), p_limit INTEGER DEFAULT 5, p_similarity_threshold FLOAT DEFAULT 0.5 ) RETURNS TABLE ( id INTEGER, content TEXT, similarity FLOAT ) AS $$ BEGIN RETURN QUERY SELECT c.id, c.content, 1 - (c.embedding <=> p_embedding) AS similarity FROM intelligence.chunks c WHERE c.embedding IS NOT NULL AND 1 - (c.embedding <=> p_embedding) > p_similarity_threshold ORDER BY c.embedding <=> p_embedding LIMIT p_limit; END; $$ LANGUAGE plpgsql;

Adding Documents to the Knowledge Base

import { Pool } from 'pg';

const pool = new Pool({ connectionString: process.env.RAG_DATABASE_URL });

async function addDocument(title: string, content: string, metadata = {}) { const ollamaHost = process.env.OLLAMA_HOST || 'http://localhost:11434'; const embeddingModel = process.env.RAG_EMBEDDING_MODEL || 'nomic-embed-text';

// Generate embedding const response = await fetch(${ollamaHost}/api/embeddings, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: embeddingModel, prompt: content }), }); const { embedding } = await response.json();

// Insert document const result = await pool.query( INSERT INTO intelligence.documents (title, content, metadata, embedding) VALUES ($1, $2, $3, $4::vector) RETURNING id, [title, content, metadata, [${embedding.join(',')}]] );

const documentId = result.rows[0].id;

// Create chunks await pool.query('SELECT intelligence.create_document_chunks($1)', [documentId]);

// Generate embeddings for chunks const chunks = await pool.query( 'SELECT id, content FROM intelligence.chunks WHERE document_id = $1', [documentId] );

for (const chunk of chunks.rows) { const chunkResponse = await fetch(${ollamaHost}/api/embeddings, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: embeddingModel, prompt: chunk.content }), }); const { embedding: chunkEmbedding } = await chunkResponse.json();

await pool.query(
  'UPDATE intelligence.chunks SET embedding = $1::vector WHERE id = $2',
  [`[${chunkEmbedding.join(',')}]`, chunk.id]
);

}

return documentId; }

Troubleshooting

Issue Solution

"RAG_DATABASE_URL not set" Set the environment variable or pass databaseUrl to createRAGKit

"Connection refused" to database Ensure PostgreSQL is running (see pgpm-docker skill)

"Connection refused" to Ollama Ensure Ollama is running: ollama serve

"type vector does not exist" Run the setup script to install pgvector extension

No context retrieved Lower RAG_SIMILARITY_THRESHOLD or add more documents

Slow responses Reduce RAG_CONTEXT_LIMIT or optimize database indexes

References

Related skill: pgvector-setup for database schema details
Related skill: pgvector-embeddings for embedding generation
Related skill: pgvector-similarity-search for retrieval queries
Related skill: rag-pipeline for complete RAG implementation
Related skill: ollama-integration for Ollama client details
Related skill: pgpm-docker for PostgreSQL container management

agentic-kit-rag

Safety Notice

Copy this and send it to your AI assistant to learn

RAG Configuration

Ollama Configuration

Retrieval Settings

Run the setup script

Source Transparency

Related Skills

constructive-agent-e2e

planning-blueprinting

drizzle-orm