Embeddings

Convert text to dense vector representations for semantic search and similarity.

Quick Reference

from openai import OpenAI

client = OpenAI()

Single text embedding

response = client.embeddings.create( model="text-embedding-3-small", input="Your text here" ) vector = response.data[0].embedding # 1536 dimensions

Batch embedding (efficient)

texts = ["text1", "text2", "text3"] response = client.embeddings.create( model="text-embedding-3-small", input=texts ) vectors = [item.embedding for item in response.data]

Model Selection

Model Dims Cost Use Case

text-embedding-3-small

1536 $0.02/1M General purpose

text-embedding-3-large

3072 $0.13/1M High accuracy

nomic-embed-text (Ollama) 768 Free Local/CI

Chunking Strategy

def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50) -> list[str]: """Split text into overlapping chunks for embedding.""" words = text.split() chunks = []

for i in range(0, len(words), chunk_size - overlap):
    chunk = " ".join(words[i:i + chunk_size])
    if chunk:
        chunks.append(chunk)

return chunks

Guidelines:

Chunk size: 256-1024 tokens (512 typical)
Overlap: 10-20% for context continuity
Include metadata (title, source) with chunks

Similarity Calculation

import numpy as np

def cosine_similarity(a: list[float], b: list[float]) -> float: """Calculate cosine similarity between two vectors.""" a, b = np.array(a), np.array(b) return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

Usage

similarity = cosine_similarity(vector1, vector2)

1.0 = identical, 0.0 = orthogonal, -1.0 = opposite

Key Decisions

Dimension reduction: Can truncate text-embedding-3-large to 1536 dims
Normalization: Most models return normalized vectors
Batch size: 100-500 texts per API call for efficiency

Common Mistakes

Embedding queries differently than documents
Not chunking long documents (context gets lost)
Using wrong similarity metric (cosine vs euclidean)
Re-embedding unchanged content (cache embeddings)

Advanced Patterns

See references/advanced-patterns.md for:

Late Chunking: Embed full document, extract chunk vectors from contextualized tokens
Batch API: Production batching with rate limiting and retry
Embedding Cache: Redis-based caching to avoid re-embedding
Matryoshka Embeddings: Dimension reduction with text-embedding-3

Related Skills

rag-retrieval
Using embeddings for RAG pipelines
hyde-retrieval
Hypothetical document embeddings for vocabulary mismatch
contextual-retrieval
Anthropic's context-prepending technique
reranking-patterns
Cross-encoder reranking for precision
ollama-local
Local embeddings with nomic-embed-text

Capability Details

text-to-vector

Keywords: embedding, text to vector, vectorize, embed text Solves:

Convert text to vector embeddings
Choose appropriate embedding models
Handle embedding API integration

semantic-search

Keywords: semantic search, vector search, similarity search, find similar Solves:

Implement semantic search over documents
Configure similarity thresholds
Rank results by relevance

chunking-strategies

Keywords: chunk, chunking, split, text splitting, overlap Solves:

Split documents into optimal chunks
Configure chunk size and overlap
Preserve semantic boundaries

batch-embedding

Keywords: batch, bulk embed, parallel embedding, batch processing Solves:

Embed large document collections efficiently
Handle rate limits and retries
Optimize embedding costs

local-embeddings

Keywords: local, ollama, self-hosted, on-premise, offline Solves:

Run embeddings locally with Ollama
Deploy self-hosted embedding models
Reduce API costs with local models

embeddings

Safety Notice

Copy this and send it to your AI assistant to learn

Single text embedding

Batch embedding (efficient)

Usage

1.0 = identical, 0.0 = orthogonal, -1.0 = opposite

Source Transparency

Related Skills

responsive-patterns

domain-driven-design

dashboard-patterns