vector-db
Purpose
This skill enables the management of vector databases for storing, indexing, and querying high-dimensional vectors, optimizing AI/ML workflows for tasks like similarity searches and embeddings.
When to Use
Use this skill for AI/ML applications requiring fast vector similarity queries, such as building recommendation engines, semantic search in NLP, or image retrieval systems. Apply it when dealing with large-scale vector data (e.g., embeddings from models like BERT) to avoid brute-force comparisons.
Key Capabilities
-
Store vectors with metadata and perform efficient nearest-neighbor searches using indexes.
-
Support distance metrics like cosine, Euclidean, and dot product for similarity calculations.
-
Handle vector dimensions up to 2048 and scale to millions of entries.
-
Integrate with embedding models for real-time vector generation and querying.
Usage Patterns
Invoke this skill via CLI for quick operations or through API calls in code. Always set the environment variable $VECTOR_DB_API_KEY for authentication before use. For CLI, prefix commands with vector-db and use JSON config files for complex setups (e.g., config.json with { "dimension": 768, "metric": "cosine" } ). In code, use HTTP requests to the API endpoint, ensuring error checking on responses. Pattern: First, create an index; then, insert vectors; finally, query them.
Common Commands/API
Use the CLI tool vector-db or the API at https://api.openclaw.com/vector-db/v1 . Authentication requires $VECTOR_DB_API_KEY in headers.
CLI Command: Create an index
vector-db create index --name myindex --dimension 768 --metric cosine --file config.json
This initializes a new index; ensure config.json specifies additional options like shards.
CLI Command: Insert vectors
vector-db insert --index myindex --vectors "[0.1, 0.2, 0.3]" --id vec1
Vectors must be in JSON array format; use --batch flag for multiple inserts.
API Endpoint: Query vectors
POST https://api.openclaw.com/vector-db/v1/indexes/myindex/query
Body: { "vector": [0.1, 0.2, 0.3], "top_k": 5 }
Response: JSON array of nearest neighbors.
API Endpoint: Delete index
DELETE https://api.openclaw.com/vector-db/v1/indexes/myindex
Include header: Authorization: Bearer $VECTOR_DB_API_KEY
Config format: Use JSON files like { "index_name": "myindex", "vector_size": 768, "distance": "cosine" } for CLI operations.
Integration Notes
Integrate with AI/ML tools by exporting vectors from models and using this skill for storage. Set $VECTOR_DB_API_KEY in your environment or .env file. For Python integration, use requests library:
import requests
headers = {'Authorization': f'Bearer {os.environ.get("VECTOR_DB_API_KEY")}' }
response = requests.post('https://api.openclaw.com/vector-db/v1/indexes/myindex/insert', json={'vectors': [[0.1, 0.2]]}, headers=headers)
Ensure the API base URL matches your deployment; handle rate limits by adding retries. For clustering with aimlops, link via shared IDs (e.g., use skill ID "vector-db" in workflows).
Error Handling
Common errors include authentication failures (HTTP 401) from missing $VECTOR_DB_API_KEY , invalid vector dimensions (e.g., mismatch with index), or network issues. To handle:
-
Check for 401 errors and prompt user to set $VECTOR_DB_API_KEY .
-
For invalid inputs, use try-except in code: try:
response = requests.post(url, json=data)
response.raise_for_status()
except requests.exceptions.HTTPError as e:
print(f"Error: {e} - Check vector dimensions.") -
CLI errors show as "Error: Invalid metric specified"; fix by verifying command flags. Always validate inputs before sending requests.
Concrete Usage Examples
Example: Building a simple search engine
First, create an index: vector-db create index --name searchindex --dimension 512 .
Insert embeddings: vector-db insert --index searchindex --vectors '[[0.5, 0.6], [0.7, 0.8]]' --ids 'doc1,doc2' .
Query for similarities: Use API POST to /indexes/searchindex/query with body { "vector": [0.5, 0.6], "top_k": 3 } .
This pattern is ideal for NLP, e.g., searching similar documents based on embeddings.
Example: Image similarity in ML pipeline
Generate image embeddings with a model, then store: vector-db insert --index imageindex --vectors '[[0.1, 0.2, 0.3]]' --metadata '{"url": "image1.jpg"}' .
Query for similar images: CLI vector-db query --index imageindex --vector [0.1, 0.2, 0.3] --top_k 5 .
Integrate in code by fetching results and filtering by metadata, useful for recommendation systems.
Graph Relationships
-
Connected to cluster: aimlops (e.g., shares data pipelines with data-processing skills).
-
Relates to: embedding-generation skills (for vector creation) and query-optimization tools (for enhancing searches).
-
Links with: ai skills for ML model integration and ml skills for training data storage.