Weaviate Database Operations
This skill provides comprehensive access to Weaviate vector databases including search operations, natural language queries, schema inspection, data exploration, filtered fetching, collection creation, and data imports.
Weaviate Cloud Instance
If the user does not have an instance yet, direct them to the cloud console to register and create a free sandbox. Create a Weaviate instance via Weaviate Cloud.
Environment Variables
Required:
WEAVIATE_URL- Your Weaviate Cloud cluster URLWEAVIATE_API_KEY- Your Weaviate API key
External Provider Keys (auto-detected): Set only the keys your collections use, refer to Environment Requirements for more information.
Script Index
Search & Query
- Query Agent - Ask Mode: Use when the user wants a direct answer to a question based on collection data. The Query Agent synthesizes information from one or more collections and returns a structured response with source citations (collection name and object ID).
- Query Agent - Search Mode: Use when the user wants to explore or browse raw objects across one or more collections. Unlike ask mode, this returns the actual data objects rather than a synthesized answer.
- Hybrid Search: Default choice for most searches. Provides a good balance of semantic understanding and exact keyword matching. Use this when you are unsure which search type to pick.
- Semantic Search: Use for finding conceptually similar content regardless of exact wording. Best when the intent matters more than specific keywords.
- Keyword Search: Use for finding exact terms, IDs, SKUs, or specific text patterns. Best when precise keyword matching is needed rather than semantic similarity.
Collection Management
- List Collections: Use to discover what collections exist in the Weaviate instance. This should typically be the first step before performing any search or data operation.
- Get Collection Details: Use to understand a collection's schema — its properties, data types, vectorizer configuration, replication factor, and multi-tenancy status. Helpful before running searches or imports.
- Explore Collection: Use to analyze data distribution, top values, and inspect actual content in a collection. Helpful for understanding what data looks like before querying.
- Create Collection: Use to create new collections with custom schemas before importing data. Do not specify a vectorizer unless the user explicitly requests one (the default
text2vec_weaviateis used).
Data Operations
- Fetch and Filter: Use to retrieve specific objects by ID or strictly filtered subsets of data. Best for precise data retrieval rather than search.
- Import Data: Use to bulk import data into an existing collection from CSV, JSON, or JSONL files.
- Create Example Data: Use to create example data for immediate use of other skills, if no data is available or user requests some toy data.
Recommendations
-
Start by listing collections if you don't know what's available:
uv run scripts/list_collections.py -
Ask the user if they want to create example data if nothing is available and the user requests it. Otherwise continue.
uv run scripts/example_data.py -
Get collection details to understand the schema:
uv run scripts/get_collection.py --name "COLLECTION_NAME" -
Explore collection data to see values and statistics:
uv run scripts/explore_collection.py "COLLECTION_NAME" -
Import data to populate a new collection (if needed):
uv run scripts/import.py "data.csv" --collection "CollectionName" -
Do not specify a vectorizer when creating collections unless requested:
uv run scripts/create_collection.py Article \ --properties '[{"name": "title", "data_type": "text"}, {"name": "body", "data_type": "text"}]' -
Choose the right search type:
- Get AI-powered answers with source citations across multiple collections →
ask.py - Get raw objects from multiple collections →
query_search.py - General search →
hybrid_search.py(default) - Conceptual similarity →
semantic_search.py - Exact terms/IDs →
keyword_search.py
- Get AI-powered answers with source citations across multiple collections →
Output Formats
All scripts support:
- Markdown tables (default and recommended)
- JSON (
--jsonflag)
Error Handling
Common errors:
WEAVIATE_URL not set→ Set the environment variableCollection not found→ Uselist_collections.pyto see available collectionsAuthentication error→ Check API keys for both Weaviate and vectorizer providers