Vector Search - Embedding Queries & Similarity Search
Codifies the project's dual vector search systems (Memory Store for agent domain knowledge, RAG Pipeline for document retrieval), the multi-provider embedding abstraction, pgvector indexing, hybrid search scoring, and chunking strategies. All patterns are built on Supabase/PostgreSQL with pgvector.
Description
Codifies pgvector embedding queries, similarity search, hybrid search, and multi-provider embedding generation for NodeJS-Starter-V1's Supabase/PostgreSQL stack, covering the Memory Store and RAG Pipeline vector infrastructure, indexing strategies, and chunking patterns.
When to Apply
Positive Triggers
-
Adding semantic search to new data types
-
Creating or modifying embedding generation logic
-
Implementing similarity queries or nearest-neighbour lookups
-
Configuring chunking strategies for document ingestion
-
Tuning search relevance (thresholds, weights, reranking)
-
Adding new embedding providers
-
User mentions: "vector", "embedding", "semantic search", "similarity", "RAG", "pgvector", "cosine"
Negative Triggers
-
Building dashboard UI for search results (use dashboard-patterns instead)
-
Adding full-text keyword search only (use PostgreSQL tsvector directly)
-
Instrumenting search latency metrics (use metrics-collector instead)
-
Logging search queries (use structured-logging instead)
Core Directives
The Three Laws of Vector Search
-
Provider-agnostic: All embedding generation goes through EmbeddingProvider abstraction. Never call OpenAI/Ollama directly.
-
Hybrid by default: Combine vector similarity with keyword matching. Pure vector search misses exact terms; pure keyword misses semantics.
-
Server-side scoring: Similarity computation happens in PostgreSQL via RPC functions. Never download all vectors to Python for client-side comparison.
Existing Project Infrastructure
Two Vector Search Systems
System Location Purpose Table
Memory Store src/memory/store.py
Agent domain knowledge (patterns, preferences, debugging) domain_memories
RAG Pipeline src/rag/storage.py
Document retrieval (uploaded docs, chunked content) document_chunks
Both share the same EmbeddingProvider abstraction from src/memory/embeddings.py .
Embedding Providers
Provider Model Dimensions Use Case
OpenAI text-embedding-3-small
1536 Production (preferred)
Ollama nomic-embed-text
768 Local development (free)
Simple Hash-based 1536 Testing only (deterministic)
Selection via get_embedding_provider() — checks OPENAI_API_KEY , then ANTHROPIC_API_KEY , then falls back to SimpleEmbeddingProvider .
API Routes
Route Method Search Type
/rag/search
POST Vector, hybrid, or keyword
/rag/upload
POST Document ingestion + embedding
/api/search
POST Full-text search (tsvector only)
Database
Table Vector Column Index Type Distance Function
documents
VECTOR(1536)
IVFFlat vector_cosine_ops
domain_memories
embedding
— Cosine (via RPC)
document_chunks
embedding
— Cosine (via RPC)
Embedding Provider Pattern
The EmbeddingProvider abstract base class defines a single method:
class EmbeddingProvider(ABC): @abstractmethod async def get_embedding(self, text: str) -> list[float]: """Generate embedding vector for text.""" pass
Three implementations: OpenAIEmbeddingProvider (calls /v1/embeddings via httpx), OllamaEmbeddingProvider (local /api/embeddings ), SimpleEmbeddingProvider (hash-based, testing only).
Adding a New Provider
-
Subclass EmbeddingProvider
-
Implement get_embedding() returning a fixed-dimension vector
-
Add selection logic in get_embedding_provider()
-
Match the dimension to existing index (1536 for OpenAI compatibility, or create a separate index)
Dimension Consistency Rule
All vectors in a table MUST share the same dimension. If mixing providers with different dimensions (e.g., OpenAI 1536 vs Ollama 768), either:
-
Pad/truncate to a standard dimension, OR
-
Use separate columns per dimension, OR
-
Standardise on one dimension and re-embed when switching providers
The project currently standardises on 1536 dimensions (OpenAI).
Search Patterns
Similarity Search (Memory Store)
MemoryStore.find_similar() generates a query embedding and calls the find_similar_memories PostgreSQL RPC:
async def find_similar(self, query_text: str, domain: MemoryDomain | None = None, user_id: str | None = None, similarity_threshold: float = 0.7, limit: int = 10, ) -> list[dict[str, Any]]: query_embedding = await self.embedding_provider.get_embedding(query_text) result = self.client.rpc("find_similar_memories", { "query_embedding": json.dumps(query_embedding), "match_threshold": similarity_threshold, "match_count": limit, "filter_domain": domain.value if domain else None, "filter_user_id": user_id, }).execute() return result.data or []
Key parameters: match_threshold (0.0–1.0, cosine similarity minimum), match_count (max results). Domain and user filters are applied server-side in the RPC function.
Hybrid Search (RAG Pipeline)
RAGStore.hybrid_search() combines vector similarity with keyword matching using configurable weights:
async def hybrid_search(self, query: str, project_id: str, vector_weight: float = 0.6, keyword_weight: float = 0.4, limit: int = 10, threshold: float = 0.5, ) -> list[dict[str, Any]]: query_embedding = await self.embedding_provider.get_embedding(query) result = self.client.rpc("hybrid_search", { "query_text": query, "query_embedding": query_embedding, "project_id_filter": project_id, "vector_weight": vector_weight, "keyword_weight": keyword_weight, "match_threshold": threshold, "match_count": limit, }).execute() return result.data or []
Default weights: 60% vector + 40% keyword. Adjust for domain:
-
Technical docs: 70/30 (semantics matter more)
-
Exact match scenarios (IDs, codes): 30/70 (keywords matter more)
-
General content: 60/40 (balanced)
Full-Text Search (PostgreSQL tsvector)
The /api/search route uses native PostgreSQL full-text search with ts_rank :
func.ts_rank( func.to_tsvector("english", Document.title + " " + Document.content), func.plainto_tsquery("english", query_text), 32, # RANK_CD normalisation flag ).label("relevance")
This is independent of vector search and uses the documents table directly via SQLAlchemy.
Indexing Patterns
IVFFlat Index (Current)
The project uses IVFFlat for approximate nearest-neighbour search:
CREATE INDEX idx_documents_embedding ON documents USING ivfflat (embedding vector_cosine_ops);
IVFFlat partitions vectors into lists (clusters). Query searches only the nearest cluster(s), trading recall for speed.
Tuning parameters:
-
lists (build-time): Number of clusters. Rule of thumb: sqrt(row_count) for < 1M rows
-
probes (query-time): Number of clusters to search. Higher = better recall, slower. Default: 1
-- Set probes for a session (higher = more accurate, slower) SET ivfflat.probes = 10;
HNSW Index (Recommended for Production)
For datasets > 10K rows, prefer HNSW (Hierarchical Navigable Small World):
CREATE INDEX idx_documents_embedding_hnsw ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
HNSW provides better recall than IVFFlat without manual tuning. Higher m and ef_construction improve quality at the cost of build time and memory.
Distance Functions
Function Operator Index Ops Use When
Cosine similarity <=>
vector_cosine_ops
Normalised embeddings (most common)
L2 distance <->
vector_l2_ops
Raw distance comparison
Inner product <#>
vector_ip_ops
Pre-normalised, performance-critical
The project uses cosine similarity (vector_cosine_ops ) throughout.
Chunking Strategies
The RAG pipeline supports five chunking strategies via ChunkingStrategy enum:
Strategy When to Use Config
FIXED_SIZE
Uniform chunks, simple content chunk_size=512, chunk_overlap=50
SEMANTIC
Respects paragraph/section boundaries Same + boundary detection
RECURSIVE
Nested structure (Markdown, HTML) Splits by headers, then paragraphs, then sentences
PARENT_CHILD
Best recall with context parent_chunk_size=2048 , child chunk_size=512
CODE_AWARE
Source code files Splits by functions/classes
Default: PARENT_CHILD with 512-token children and 2048-token parents. Search matches children; context retrieval includes the parent chunk.
Pipeline Config
PipelineConfig( chunking_strategy=ChunkingStrategy.PARENT_CHILD, chunk_size=512, chunk_overlap=50, parent_chunk_size=2048, generate_embeddings=True, generate_keywords=True, )
Relevance & Scoring
Threshold Guidelines
Threshold Meaning Use Case
0.9+ Near-exact semantic match Deduplication
0.7–0.9 Strong relevance Default search
0.5–0.7 Moderate relevance Exploratory search
< 0.5 Weak match Usually noise
The Memory Store defaults to similarity_threshold=0.7 . The RAG Pipeline defaults to min_score=0.5 .
Relevance Decay
MemoryStore.update_relevance() adjusts memory relevance based on feedback:
-
Positive feedback (+0.1 per point, capped at 1.0)
-
Negative feedback (configurable decay_rate , default 0.1, floored at 0.0)
Stale Memory Pruning
MemoryStore.prune_stale() removes memories below min_relevance=0.3 or older than max_age_days=90 via the prune_stale_memories RPC.
Pydantic Models
Memory System
Model Fields Purpose
MemoryEntry
domain, category, key, value, embedding, relevance_score, access_count Core memory unit
MemoryQuery
domain, category, query_text, similarity_threshold, tags, limit, offset Query specification
MemoryResult
entries, total_count, query Paginated result
MemoryDomain
KNOWLEDGE, PREFERENCE, TESTING, DEBUGGING Domain enum
RAG System
Model Fields Purpose
DocumentChunk
source_id, content, embedding, chunk_level, heading_hierarchy, keywords Chunk record
DocumentSource
source_type, source_uri, status, metadata Source tracking
SearchRequest
query, project_id, search_type, vector_weight, keyword_weight, min_score Search input
SearchResult
chunk_id, content, vector_score, keyword_score, combined_score Result item
SearchResponse
results, total_count, search_type, execution_time_ms Search output
Database Schema
documents Table (Legacy)
CREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title VARCHAR(500) NOT NULL, content TEXT NOT NULL, embedding VECTOR(1536), -- ... other columns ); CREATE INDEX idx_documents_embedding ON documents USING ivfflat (embedding vector_cosine_ops);
domain_memories Table
Stores agent memories with embeddings for semantic retrieval. Accessed via MemoryStore class.
document_chunks Table
Stores RAG pipeline chunks with embeddings. Accessed via RAGStore class. Includes heading_hierarchy , summary , entities , keywords , and classification_tags for enriched retrieval.
RPC Functions
Function Purpose
find_similar_memories
Cosine similarity search on domain_memories with domain/user filters
hybrid_search
Combined vector + keyword search on document_chunks
prune_stale_memories
Delete low-relevance or expired memories
increment_memory_access
Increment access count on retrieval
Anti-Patterns
Anti-Pattern Why It Fails Correct Approach
Client-side similarity computation Downloads all vectors, O(n) per query, no index usage PostgreSQL RPC with pgvector index
Mixing embedding dimensions in one column VECTOR(1536) rejects 768-dim vectors Standardise dimension or use separate columns
No similarity threshold Returns noise matches below 0.3 Always set match_threshold (0.5–0.7)
Embedding at query time without caching Re-embeds identical queries Cache query embeddings for repeated searches
IVFFlat with probes=1 on large datasets Poor recall (misses relevant results) Increase probes or migrate to HNSW
Storing embeddings without indexing Sequential scan on every query Create IVFFlat or HNSW index
Hardcoding OpenAI API calls Breaks local development, vendor lock-in Use EmbeddingProvider abstraction
Chunking without overlap Loses context at chunk boundaries Set chunk_overlap=50 minimum
Checklist for New Vector Search Features
Embedding
-
Uses EmbeddingProvider abstraction (never direct API calls)
-
Dimension matches existing index (1536 default)
-
Handles provider unavailability (fallback or graceful error)
Search
-
Hybrid search by default (vector + keyword)
-
Similarity threshold configured (not unbounded)
-
Server-side computation via PostgreSQL RPC
-
Results include similarity scores for transparency
Indexing
-
pgvector index created on embedding column
-
Distance function matches query pattern (cosine for normalised)
-
Index type appropriate for dataset size (IVFFlat < 10K, HNSW >= 10K)
Data Quality
-
Chunking strategy matches content type
-
Chunk overlap prevents boundary information loss
-
Stale/expired entries have pruning mechanism
Integration
-
Search latency instrumented via metrics-collector
-
Errors use error-taxonomy codes
-
Queries logged via structured-logging
Response Format
[AGENT_ACTIVATED]: Vector Search [PHASE]: {Design | Implementation | Review} [STATUS]: {in_progress | complete}
{vector search analysis or implementation guidance}
[NEXT_ACTION]: {what to do next}
Integration Points
Council of Logic
-
Turing: Verify search is O(log n) via index, not O(n) sequential scan
-
Shannon: Embedding dimension and chunk size tuned for information density
Metrics Collector
-
search_query_duration_ms histogram for search latency
-
search_result_count gauge for average results per query
-
embedding_generation_duration_ms histogram for provider latency
Structured Logging
-
Debug-level embedding generation logs (model, dimensions, text length)
-
Info-level search execution logs (query, domain, result count)
Error Taxonomy
-
DATA_VECTOR_PROVIDER_UNAVAILABLE (503) — embedding provider down
-
DATA_VECTOR_DIMENSION_MISMATCH (422) — wrong embedding dimension
-
DATA_VECTOR_THRESHOLD_INVALID (422) — threshold out of [0, 1] range
Data Validation
-
SearchRequest validated via Pydantic (query non-empty, threshold in range, limit bounded)
-
PipelineConfig validates chunk sizes and strategy enum
Dashboard Patterns
-
Search results displayed via DataStrip for aggregate metrics
-
Real-time search activity via Supabase Realtime on document_chunks table
Australian Localisation (en-AU)
-
Spelling: neighbour, optimise, normalise, analyse, behaviour, colour
-
Date: ISO 8601 in storage; DD/MM/YYYY in UI display
-
Timezone: AEST/AEDT — timestamps stored as UTC, converted for display