ToolUniverse Python SDK

ToolUniverse provides programmatic access to 1000++ scientific tools through a unified interface. It implements the AI-Tool Interaction Protocol for building AI scientist systems that integrate ML models, databases, APIs, and scientific packages.

IMPORTANT - Language Handling: Most tools accept English terms only. When building workflows, always translate non-English input to English before passing to tool parameters. Only try original-language terms as a fallback if English returns no results.

Installation

# Standard installation
pip install tooluniverse

# With optional features
pip install tooluniverse[embedding]  # Embedding search (GPU)
pip install tooluniverse[ml]         # ML model tools
pip install tooluniverse[all]        # All features

Environment Setup

# Required for LLM-based tool search and hooks
export OPENAI_API_KEY="sk-..."

# Optional for higher rate limits
export NCBI_API_KEY="..."

Or use .env file:

from dotenv import load_dotenv
load_dotenv()

Quick Start

from tooluniverse import ToolUniverse

# 1. Initialize and load tools
tu = ToolUniverse()
tu.load_tools()  # Loads 1000++ tools (~5-10 seconds first time)

# 2. Find tools (three methods)
# Method A: Keyword (fast, no API key)
tools = tu.run({
    "name": "Tool_Finder_Keyword",
    "arguments": {"description": "protein structure", "limit": 10}
})

# Method B: LLM (intelligent, requires OPENAI_API_KEY)
tools = tu.run({
    "name": "Tool_Finder_LLM",
    "arguments": {"description": "predict drug toxicity", "limit": 5}
})

# Method C: Embedding (semantic, requires GPU)
tools = tu.run({
    "name": "Tool_Finder",
    "arguments": {"description": "protein interactions", "limit": 10}
})

# 3. Execute tools (two ways)
# Dictionary API
result = tu.run({
    "name": "UniProt_get_entry_by_accession",
    "arguments": {"accession": "P05067"}
})

# Function API (recommended)
result = tu.tools.UniProt_get_entry_by_accession(accession="P05067")

Core Patterns

Pattern 1: Discovery → Execute

# Find tools
tools = tu.run({
    "name": "Tool_Finder_Keyword",
    "arguments": {"description": "ADMET prediction", "limit": 3}
})

# Check results structure
if isinstance(tools, dict) and 'tools' in tools:
    for tool in tools['tools']:
        print(f"{tool['name']}: {tool['description']}")

# Execute tool
result = tu.tools.ADMETAI_predict_admet(
    smiles="CC(C)Cc1ccc(cc1)C(C)C(O)=O"
)

Pattern 2: Batch Execution

# Define calls
calls = [
    {"name": "UniProt_get_entry_by_accession", "arguments": {"accession": "P05067"}},
    {"name": "UniProt_get_entry_by_accession", "arguments": {"accession": "P12345"}},
    {"name": "RCSB_PDB_get_structure_by_id", "arguments": {"pdb_id": "1ABC"}}
]

# Execute in parallel
results = tu.run_batch(calls)

Pattern 3: Scientific Workflow

def drug_discovery_pipeline(disease_id):
    tu = ToolUniverse(use_cache=True)
    tu.load_tools()
    
    try:
        # Get targets
        targets = tu.tools.OpenTargets_get_associated_targets_by_disease_efoId(
            efoId=disease_id
        )
        
        # Get compounds (batch)
        compound_calls = [
            {"name": "ChEMBL_search_molecule_by_target", 
             "arguments": {"target_id": t['id'], "limit": 10}}
            for t in targets['data'][:5]
        ]
        compounds = tu.run_batch(compound_calls)
        
        # Predict ADMET
        admet_results = []
        for comp_list in compounds:
            if comp_list and 'molecules' in comp_list:
                for mol in comp_list['molecules'][:3]:
                    admet = tu.tools.ADMETAI_predict_admet(
                        smiles=mol['smiles'],
                        use_cache=True
                    )
                    admet_results.append(admet)
        
        return {"targets": targets, "compounds": compounds, "admet": admet_results}
    finally:
        tu.close()

Configuration

Caching

# Enable globally
tu = ToolUniverse(use_cache=True)
tu.load_tools()

# Or per-call
result = tu.tools.ADMETAI_predict_admet(
    smiles="...",
    use_cache=True  # Cache expensive predictions
)

# Manage cache
stats = tu.get_cache_stats()
tu.clear_cache()

Hooks (Auto-summarization)

# Enable hooks for large outputs
tu = ToolUniverse(hooks_enabled=True)
tu.load_tools()

result = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblID(
    ensemblId="ENSG00000012048"
)

# Check if summarized
if isinstance(result, dict) and "summary" in result:
    print(f"Summarized: {result['summary']}")

Load Specific Categories

# Faster loading
tu = ToolUniverse()
tu.load_tools(categories=["proteins", "drugs"])

Critical Things to Know

⚠️ Always Call load_tools()

# ❌ Wrong - will fail
tu = ToolUniverse()
result = tu.tools.some_tool()  # Error!

# ✅ Correct
tu = ToolUniverse()
tu.load_tools()
result = tu.tools.some_tool()

⚠️ Tool Finder Returns Nested Structure

# ❌ Wrong
tools = tu.run({"name": "Tool_Finder_Keyword", "arguments": {"description": "protein"}})
for tool in tools:  # Error: tools is dict
    print(tool['name'])

# ✅ Correct
if isinstance(tools, dict) and 'tools' in tools:
    for tool in tools['tools']:
        print(tool['name'])

⚠️ Check Required Parameters

# Check tool schema first
tool_info = tu.all_tool_dict["UniProt_get_entry_by_accession"]
required = tool_info['parameter'].get('required', [])
print(f"Required: {required}")

# Then call
result = tu.tools.UniProt_get_entry_by_accession(accession="P05067")

⚠️ Cache Strategy

# ✅ Cache: ML predictions, database queries (deterministic)
result = tu.tools.ADMETAI_predict_admet(smiles="...", use_cache=True)

# ❌ Don't cache: real-time data, time-sensitive results
result = tu.tools.get_latest_publications()  # No cache

⚠️ Error Handling

from tooluniverse.exceptions import ToolError, ToolUnavailableError

try:
    result = tu.tools.UniProt_get_entry_by_accession(accession="P05067")
except ToolUnavailableError as e:
    print(f"Tool unavailable: {e}")
except ToolError as e:
    print(f"Execution failed: {e}")

⚠️ Tool Names Are Case-Sensitive

# ❌ Wrong
result = tu.tools.uniprot_get_entry_by_accession(accession="P05067")

# ✅ Correct
result = tu.tools.UniProt_get_entry_by_accession(accession="P05067")

Execution Options

result = tu.tools.tool_name(
    param="value",
    use_cache=True,      # Cache this call
    validate=True,       # Validate parameters (default)
    stream_callback=None # Streaming output
)

Performance Tips

# 1. Load specific categories
tu.load_tools(categories=["proteins"])

# 2. Use batch execution
results = tu.run_batch(calls)

# 3. Enable caching
tu = ToolUniverse(use_cache=True)

# 4. Disable validation (after testing)
result = tu.tools.tool_name(param="value", validate=False)

Troubleshooting

Tool Not Found

# Search for tool
tools = tu.run({
    "name": "Tool_Finder_Keyword",
    "arguments": {"description": "partial_name", "limit": 10}
})

# Check if exists
if "Tool_Name" in tu.all_tool_dict:
    print("Found!")

API Key Issues

import os
if not os.environ.get("OPENAI_API_KEY"):
    print("⚠️ OPENAI_API_KEY not set")
    print("Set: export OPENAI_API_KEY='sk-...'")

Validation Errors

from tooluniverse.exceptions import ToolValidationError

try:
    result = tu.tools.some_tool(param="value")
except ToolValidationError as e:
    # Check schema
    tool_info = tu.all_tool_dict["some_tool"]
    print(f"Required: {tool_info['parameter'].get('required', [])}")
    print(f"Properties: {tool_info['parameter']['properties'].keys()}")

Enable Debug Logging

from tooluniverse.logging_config import set_log_level
set_log_level("DEBUG")

Tool Categories

Category	Tools	Use Cases
Proteins	UniProt, RCSB PDB, AlphaFold	Protein analysis, structure
Drugs	DrugBank, ChEMBL, PubChem	Drug discovery, compounds
Genomics	Ensembl, NCBI Gene, gnomAD	Gene analysis, variants
Diseases	OpenTargets, ClinVar	Disease-target associations
Literature	PubMed, Europe PMC	Literature search
ML Models	ADMET-AI, AlphaFold	Predictions, modeling
Pathways	KEGG, Reactome	Pathway analysis

Resources

Documentation: https://zitniklab.hms.harvard.edu/ToolUniverse/
Tool List: https://zitniklab.hms.harvard.edu/ToolUniverse/tools/tools_config_index.html
GitHub: https://github.com/mims-harvard/ToolUniverse
Examples: See examples/ directory in repository
Slack: https://join.slack.com/t/tooluniversehq/shared_invite/zt-3dic3eoio-5xxoJch7TLNibNQn5_AREQ

For detailed guides, see REFERENCE.md.

tooluniverse-sdk

Safety Notice

Copy this and send it to your AI assistant to learn