using-vector-databases

Vector database implementation for AI/ML applications, semantic search, and RAG systems. Use when building chatbots, search engines, recommendation systems, or similarity-based retrieval. Covers Qdrant (primary), Pinecone, Milvus, pgvector, Chroma, embedding generation (OpenAI, Voyage, Cohere), chunking strategies, and hybrid search patterns.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "using-vector-databases" with this command: npx skills add ancoleman/ai-design-components/ancoleman-ai-design-components-using-vector-databases

Vector Databases for AI Applications

When to Use This Skill

Use this skill when implementing:

  • RAG (Retrieval-Augmented Generation) systems for AI chatbots
  • Semantic search capabilities (meaning-based, not just keyword)
  • Recommendation systems based on similarity
  • Multi-modal AI (unified search across text, images, audio)
  • Document similarity and deduplication
  • Question answering over private knowledge bases

Quick Decision Framework

1. Vector Database Selection

START: Choosing a Vector Database

EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│  └─ pgvector (<10M vectors, tight budget)
│      See: references/pgvector.md
│
└─ No existing vector database?
   │
   ├─ OPERATIONAL PREFERENCE?
   │  │
   │  ├─ Zero-ops managed only
   │  │  └─ Pinecone (fully managed, excellent DX)
   │  │      See: references/pinecone.md
   │  │
   │  └─ Flexible (self-hosted or managed)
   │     │
   │     ├─ SCALE: <100M vectors + complex filtering ⭐
   │     │  └─ Qdrant (RECOMMENDED)
   │     │      • Best metadata filtering
   │     │      • Built-in hybrid search (BM25 + Vector)
   │     │      • Self-host: Docker/K8s
   │     │      • Managed: Qdrant Cloud
   │     │      See: references/qdrant.md
   │     │
   │     ├─ SCALE: >100M vectors + GPU acceleration
   │     │  └─ Milvus / Zilliz Cloud
   │     │      See: references/milvus.md
   │     │
   │     ├─ Embedded / No server
   │     │  └─ LanceDB (serverless, edge deployment)
   │     │
   │     └─ Local prototyping
   │        └─ Chroma (simple API, in-memory)

2. Embedding Model Selection

REQUIREMENTS?

├─ Best quality (cost no object)
│  └─ Voyage AI voyage-3 (1024d)
│      • 9.74% better than OpenAI on MTEB
│      • ~$0.12/1M tokens
│      See: references/embedding-strategies.md
│
├─ Enterprise reliability
│  └─ OpenAI text-embedding-3-large (3072d)
│      • Industry standard
│      • ~$0.13/1M tokens
│      • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│  └─ OpenAI text-embedding-3-small (1536d)
│      • ~$0.02/1M tokens (6x cheaper)
│      • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│  └─ Cohere embed-v3 (1024d)
│      • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
   ├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
   ├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
   └─ Long docs: jina-embeddings-v2 (768d, 8K context)

Core Concepts

Document Chunking Strategy

Recommended defaults for most RAG systems:

  • Chunk size: 512 tokens (not characters)
  • Overlap: 50 tokens (10% overlap)

Why these numbers?

  • 512 tokens balances context vs. precision
    • Too small (128-256): Fragments concepts, loses context
    • Too large (1024-2048): Dilutes relevance, wastes LLM tokens
  • 50 token overlap ensures sentences aren't split mid-context

See references/chunking-patterns.md for advanced strategies by content type.

Hybrid Search (Vector + Keyword)

Hybrid Search = Vector Similarity + BM25 Keyword Matching

User Query: "OAuth refresh token implementation"
           │
    ┌──────┴──────┐
    │             │
Vector Search   Keyword Search
(Semantic)      (BM25)
    │             │
Top 20 docs   Top 20 docs
    │             │
    └──────┬──────┘
           │
   Reciprocal Rank Fusion
   (Merge + Re-rank)
           │
    Final Top 5 Results

Why hybrid matters:

  • Vector captures semantic meaning ("OAuth refresh" ≈ "token renewal")
  • Keyword ensures exact matches ("refresh_token" literal)
  • Combined provides best retrieval quality

See references/hybrid-search.md for implementation details.

Getting Started

Python + Qdrant Example

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# 1. Initialize client
client = QdrantClient("localhost", port=6333)

# 2. Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)

# 3. Insert documents with embeddings
points = [
    PointStruct(
        id=idx,
        vector=embedding,  # From OpenAI/Voyage/etc
        payload={
            "text": chunk_text,
            "source": "docs/api.md",
            "section": "Authentication"
        }
    )
    for idx, (embedding, chunk_text) in enumerate(chunks)
]
client.upsert(collection_name="documents", points=points)

# 4. Search with metadata filtering
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    limit=5,
    query_filter={
        "must": [
            {"key": "section", "match": {"value": "Authentication"}}
        ]
    }
)

For complete examples, see examples/qdrant-python/.

TypeScript + Qdrant Example

import { QdrantClient } from '@qdrant/js-client-rest';

const client = new QdrantClient({ url: 'http://localhost:6333' });

// Create collection
await client.createCollection('documents', {
  vectors: { size: 1024, distance: 'Cosine' }
});

// Insert documents
await client.upsert('documents', {
  points: chunks.map((chunk, idx) => ({
    id: idx,
    vector: chunk.embedding,
    payload: {
      text: chunk.text,
      source: chunk.source
    }
  }))
});

// Search
const results = await client.search('documents', {
  vector: queryEmbedding,
  limit: 5,
  filter: {
    must: [
      { key: 'source', match: { value: 'docs/api.md' } }
    ]
  }
});

For complete examples, see examples/typescript-rag/.

RAG Pipeline Architecture

Complete Pipeline Components

1. INGESTION
   ├─ Document Loading (PDF, web, code, Office)
   ├─ Text Extraction & Cleaning
   ├─ Chunking (semantic, recursive, code-aware)
   └─ Embedding Generation (batch, rate-limited)

2. INDEXING
   ├─ Vector Store Insertion (batch upsert)
   ├─ Index Configuration (HNSW, distance metric)
   └─ Keyword Index (BM25 for hybrid search)

3. RETRIEVAL (Query Time)
   ├─ Query Processing (expansion, embedding)
   ├─ Hybrid Search (vector + keyword)
   ├─ Filtering & Post-Processing (metadata, MMR)
   └─ Re-Ranking (cross-encoder, LLM-based)

4. GENERATION
   ├─ Context Construction (format chunks, citations)
   ├─ Prompt Engineering (system + context + query)
   ├─ LLM Inference (streaming, temperature tuning)
   └─ Response Post-Processing (citations, validation)

5. EVALUATION (Production Critical)
   ├─ Retrieval Metrics (precision, recall, relevancy)
   ├─ Generation Metrics (faithfulness, correctness)
   └─ System Metrics (latency, cost, satisfaction)

Essential Metadata for Production RAG

Critical for filtering and relevance:

metadata = {
    # SOURCE TRACKING
    "source": "docs/api-reference.md",
    "source_type": "documentation",  # code, docs, logs, chat
    "last_updated": "2025-12-01T12:00:00Z",

    # HIERARCHICAL CONTEXT
    "section": "Authentication",
    "subsection": "OAuth 2.1",
    "heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],

    # CONTENT CLASSIFICATION
    "content_type": "code_example",  # prose, code, table, list
    "programming_language": "python",

    # FILTERING DIMENSIONS
    "product_version": "v2.0",
    "audience": "enterprise",  # free, pro, enterprise

    # RETRIEVAL HINTS
    "chunk_index": 3,
    "total_chunks": 12,
    "has_code": True
}

Why metadata matters:

  • Enables filtering BEFORE vector search (reduces search space)
  • Improves relevance through targeted retrieval
  • Supports multi-tenant systems (filter by user/org)
  • Enables versioned documentation (filter by product version)

Evaluation with RAGAS

Use scripts/evaluate_rag.py for automated evaluation:

from ragas import evaluate
from ragas.metrics import (
    faithfulness,       # Answer grounded in context
    answer_relevancy,   # Answer addresses query
    context_recall,     # Retrieved docs cover ground truth
    context_precision   # Retrieved docs are relevant
)

# Test dataset
test_data = {
    "question": ["How do I refresh OAuth tokens?"],
    "answer": ["Use /token with refresh_token grant..."],
    "contexts": [["OAuth refresh documentation..."]],
    "ground_truth": ["POST to /token with grant_type=refresh_token"]
}

# Evaluate
results = evaluate(test_data, metrics=[
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision
])

# Production targets:
# faithfulness: >0.90 (minimal hallucination)
# answer_relevancy: >0.85 (addresses user query)
# context_recall: >0.80 (sufficient context retrieved)
# context_precision: >0.75 (minimal noise)

Performance Optimization

Embedding Generation

  • Batch processing: 100-500 chunks per batch
  • Caching: Cache embeddings by content hash
  • Rate limiting: Respect API provider limits (exponential backoff)

Vector Search

  • Index type: HNSW (Hierarchical Navigable Small World) for most cases
  • Distance metric: Cosine for normalized embeddings
  • Pre-filtering: Apply metadata filters before vector search
  • Result diversity: Use MMR (Maximal Marginal Relevance) to reduce redundancy

Cost Optimization

  • Embedding model: Consider text-embedding-3-small for budget constraints
  • Dimension reduction: Use maturity shortening (3072d → 1024d)
  • Caching: Implement semantic caching for repeated queries
  • Batch operations: Group insertions/updates for efficiency

Common Workflows

1. Building a RAG Chatbot

  • Vector database: Qdrant (self-hosted or cloud)
  • Embeddings: OpenAI text-embedding-3-large
  • Chunking: 512 tokens, 50 overlap, semantic splitter
  • Search: Hybrid (vector + BM25)
  • Integration: Frontend with ai-chat skill

See examples/qdrant-python/ for complete implementation.

2. Semantic Search Engine

  • Vector database: Qdrant or Pinecone
  • Embeddings: Voyage AI voyage-3 (best quality)
  • Chunking: Content-type specific (see chunking-patterns.md)
  • Search: Hybrid with re-ranking
  • Filtering: Pre-filter by metadata (date, category, etc.)

3. Code Search

  • Vector database: Qdrant
  • Embeddings: OpenAI text-embedding-3-large
  • Chunking: AST-based (function/class boundaries)
  • Metadata: Language, file path, imports
  • Search: Hybrid with language filtering

See examples/qdrant-python/ for code-specific implementation.

Integration with Other Skills

Frontend Skills

  • ai-chat: Vector DB powers RAG pipeline behind chat interface
  • search-filter: Replace keyword search with semantic search
  • data-viz: Visualize embedding spaces, similarity scores

Backend Skills

  • databases-relational: Hybrid approach using pgvector extension
  • api-patterns: Expose semantic search via REST/GraphQL
  • observability: Monitor embedding quality and retrieval metrics

Multi-Language Support

Python (Primary)

  • Client: qdrant-client
  • Framework: LangChain, LlamaIndex
  • See: examples/qdrant-python/

Rust

  • Client: qdrant-client (1,549 code snippets in Context7)
  • Framework: Raw Rust for performance-critical systems
  • See: examples/rust-axum-vector/

TypeScript

  • Client: @qdrant/js-client-rest
  • Framework: LangChain.js, integration with Next.js
  • See: examples/typescript-rag/

Go

  • Client: qdrant-go
  • Use case: High-performance microservices

Troubleshooting

Poor Retrieval Quality

  1. Check chunking strategy (too large/small?)
  2. Verify metadata filtering (too restrictive?)
  3. Try hybrid search instead of vector-only
  4. Implement re-ranking stage
  5. Evaluate with RAGAS metrics

Slow Performance

  1. Use HNSW index (not Flat)
  2. Pre-filter with metadata before vector search
  3. Reduce vector dimensions (maturity shortening)
  4. Batch operations (insertions, searches)
  5. Consider GPU acceleration (Milvus)

High Costs

  1. Switch to text-embedding-3-small
  2. Implement semantic caching
  3. Reduce chunk overlap
  4. Use self-hosted embeddings (nomic, bge-m3)
  5. Batch embedding generation

Qdrant Context7 Documentation

Primary resource: /llmstxt/qdrant_tech_llms-full_txt

  • Trust score: High
  • Code snippets: 10,154
  • Quality score: 83.1

Access via Context7:

resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
  context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
  topic: "hybrid search collections python",
  mode: "code"
})

Additional Resources

Reference Documentation

  • references/qdrant.md - Comprehensive Qdrant guide
  • references/pgvector.md - PostgreSQL pgvector extension
  • references/milvus.md - Milvus/Zilliz for billion-scale
  • references/embedding-strategies.md - Embedding model comparison
  • references/chunking-patterns.md - Advanced chunking techniques

Code Examples

  • examples/qdrant-python/ - FastAPI + Qdrant RAG pipeline
  • examples/pgvector-prisma/ - PostgreSQL + Prisma integration
  • examples/typescript-rag/ - TypeScript RAG with Hono

Automation Scripts

  • scripts/generate_embeddings.py - Batch embedding generation
  • scripts/benchmark_similarity.py - Performance benchmarking
  • scripts/evaluate_rag.py - RAGAS-based evaluation

Next Steps:

  1. Choose vector database based on scale and infrastructure
  2. Select embedding model based on quality vs. cost trade-off
  3. Implement chunking strategy for the content type
  4. Set up hybrid search for production quality
  5. Evaluate with RAGAS metrics
  6. Optimize for performance and cost

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

implementing-mlops

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

managing-git-workflows

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

implementing-gitops

No summary provided by upstream source.

Repository SourceNeeds Review
using-vector-databases | V50.AI