Galileo Python SDK

The Galileo Python SDK (galileo) provides a unified interface for the Galileo AI platform — enabling evaluation, observability, and runtime guardrails for GenAI applications. It supports automatic tracing of LLM calls, custom span logging, evaluation experiments, and production-grade guardrails.

SDK Version Detection

Check installed versions before writing any code to pick the right reference:

import importlib.metadata, importlib.util

galileo_ver = importlib.metadata.version("galileo")        # e.g. "2.1.1"
pq_installed = importlib.util.find_spec("promptquality") is not None
pq_ver = importlib.metadata.version("promptquality") if pq_installed else None
print(f"galileo={galileo_ver}, promptquality={pq_ver}")

Installed stack	Use
`galileo >= 2.0` (with or without `promptquality 0.x`)	This skill — `GalileoLogger`, `@log`, `galileo_context`
`galileo < 2.0` + `promptquality >= 1.0`	Promptquality 1.x Reference

Note: promptquality >= 1.0 and galileo >= 2.0 are mutually incompatible — they require different major versions of galileo-core. Installing both will cause dependency conflicts.

Additional references:

Framework Integrations — OpenAI, Anthropic, LangChain, LangGraph, CrewAI, PydanticAI, and more
Guardrail Metrics Reference — Hallucination Index, Context Adherence, Toxicity, PII, and all available metrics
Advanced Evaluation Patterns — Experiments, eval sets, prompt optimization, and scoring
Promptquality 1.x Reference — EvaluateRun, Scorers, ScorersConfiguration for the galileo 1.x stack

Installation

pip install galileo

For evaluation features with the legacy prompt engineering interface:

pip install promptquality

For runtime guardrails:

pip install galileo-protect

Quick Start

import os
from galileo import galileo_context
from galileo.openai import openai

galileo_context.init(project="my-project", log_stream="my-log-stream")

client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
    model="gpt-4o",
)

print(response.choices[0].message.content)

galileo_context.flush()

Authentication

Set the following environment variables:

# .env file or shell environment
GALILEO_API_KEY="your-api-key"            # Required — from Galileo console
GALILEO_CONSOLE_URL="https://app.galileo.ai"  # Console URL (or self-hosted URL)
GALILEO_PROJECT="my-project"              # Optional — default project
GALILEO_LOG_STREAM="my-log-stream"        # Optional — default log stream
GALILEO_LOGGING_DISABLED="false"          # Optional — disable logging

For the legacy promptquality package, authenticate programmatically:

import promptquality as pq
pq.login("https://app.galileo.ai")

Observability and Tracing

Initializing the Galileo Context

from galileo import galileo_context

galileo_context.init(project="my-project", log_stream="my-log-stream")

Wrapped OpenAI Client (Auto-Logging)

Import the Galileo-wrapped OpenAI client to automatically trace all calls:

from galileo.openai import openai

client = openai.OpenAI()
response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Hello"}],
    model="gpt-4o",
)

The `@log` Decorator

Use @log to create spans for your functions. Supported span types: workflow, llm, retriever, tool.

from galileo import log

@log
def my_workflow():
    result = call_openai()
    return result

@log(span_type="retriever")
def retrieve_documents(query: str):
    docs = vector_store.search(query)
    return docs

@log(span_type="tool")
def search_web(query: str):
    return web_api.search(query)

Nested Workflows

from galileo import log

@log
def agent_pipeline(user_input: str):
    context = retrieve_documents(user_input)
    tool_result = search_web(user_input)
    response = generate_response(user_input, context, tool_result)
    return response

@log(span_type="retriever")
def retrieve_documents(query: str):
    return ["doc1", "doc2"]

@log(span_type="tool")
def search_web(query: str):
    return "search result"

@log
def generate_response(query: str, context: list, tool_result: str):
    client = openai.OpenAI()
    return client.chat.completions.create(
        messages=[{"role": "user", "content": query}],
        model="gpt-4o",
    )

Context Manager

Scope logging to a specific block and auto-flush on exit:

from galileo import galileo_context

with galileo_context(project="my-project", log_stream="my-log-stream"):
    result = my_workflow()
    print(result)

Flushing Traces

Upload captured traces to Galileo:

galileo_context.flush()

Evaluation

Running Experiments with `promptquality`

import promptquality as pq

pq.login("https://app.galileo.ai")

template = "Explain {{topic}} to me like I'm a 5 year old"
data = {"topic": ["Quantum Physics", "Politics", "Large Language Models"]}

pq.run(
    project_name="my-first-project",
    template=template,
    dataset=data,
    settings=pq.Settings(
        model_alias="ChatGPT (16K context)",
        temperature=0.8,
        max_tokens=400,
    ),
)

Evaluation Runs with Custom Workflows (galileo 2.x)

Use GalileoLogger to log traces for evaluation:

from galileo import GalileoLogger

logger = GalileoLogger(project="my_project", log_stream="my_run")

eval_set = ["What are hallucinations?", "What are intrinsic hallucinations?"]
for input_text in eval_set:
    output = llm.call(input_text)
    logger.add_single_llm_span_trace(
        input=input_text,
        output=output,
        model="gpt-4o",
    )

logger.flush()

For the galileo < 2.0 + promptquality >= 1.0 stack, use EvaluateRun — see Promptquality 1.x Reference.

See Advanced Evaluation Patterns for more.

Guardrails / Protect

Creating a Protection Stage

from galileo import GalileoMetrics
from galileo.stages import create_protect_stage
from galileo_core.schemas.protect.rule import Rule, RuleOperator
from galileo_core.schemas.protect.ruleset import Ruleset
from galileo_core.schemas.protect.stage import StageType

rule = Rule(
    metric=GalileoMetrics.input_toxicity,
    operator=RuleOperator.gt,
    target_value=0.1,
)

ruleset = Ruleset(rules=[rule])

stage = create_protect_stage(
    name="toxicity-guard",
    stage_type=StageType.central,
    prioritized_rulesets=[ruleset],
    description="Block toxic input.",
)

Invoking Runtime Protection

from galileo.protect import invoke_protect, ainvoke_protect
from galileo_core.schemas.protect.payload import Payload

payload = Payload(input="User message to check.")

response = invoke_protect(payload=payload, stage_name="toxicity-guard")

# Async variant
response = await ainvoke_protect(payload=payload, stage_name="toxicity-guard")

Stage Types

Central stages — Created and managed by governance teams; rulesets defined at creation time
Local stages — Created without rulesets; rulesets supplied at runtime by application teams

See Guardrail Metrics Reference for all available metrics.

Common Patterns

Multi-Turn Conversations

from galileo import log
from galileo.openai import openai

client = openai.OpenAI()

@log
def chat(messages: list):
    response = client.chat.completions.create(
        messages=messages,
        model="gpt-4o",
    )
    return response.choices[0].message.content

messages = []
messages.append({"role": "user", "content": "What is RAG?"})
reply = chat(messages)
messages.append({"role": "assistant", "content": reply})
messages.append({"role": "user", "content": "How do I implement it?"})
reply = chat(messages)

RAG Pipeline with Retriever Spans

from galileo import log
from galileo.openai import openai

client = openai.OpenAI()

@log(span_type="retriever")
def retrieve(query: str):
    results = vector_db.similarity_search(query, k=5)
    return [doc.page_content for doc in results]

@log
def rag_pipeline(question: str):
    context = retrieve(question)
    prompt = f"Context: {context}\n\nQuestion: {question}"
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model="gpt-4o",
    )
    return response.choices[0].message.content

Agent Tool Calling

from galileo import log

@log(span_type="tool")
def math_operation(a: float, b: float, op: str) -> str:
    if op == "add":
        return str(a + b)
    elif op == "multiply":
        return str(a * b)
    raise ValueError(f"Unknown op: {op}")

@log(span_type="tool")
def web_search(query: str):
    return search_api.query(query)

@log
def agent(user_input: str):
    plan = plan_actions(user_input)
    results = []
    for action in plan:
        if action.tool == "math_operation":
            results.append(math_operation(action.input))
        elif action.tool == "web_search":
            results.append(web_search(action.input))
    return synthesize(results)

Best Practices

Always set environment variables for GALILEO_API_KEY and GALILEO_CONSOLE_URL rather than hardcoding credentials.
Organize projects and log streams by application, environment, or team to keep traces manageable.
Call galileo_context.flush() at the end of each request or batch to ensure traces are uploaded. In web servers, flush at the end of each request handler.
Use the context manager (with galileo_context(...)) for scoped logging that auto-flushes on exit.
Use specific span types (retriever, tool, llm, workflow) to get the most out of Galileo's trace visualization.
Handle errors gracefully — wrap flush() calls in try/except to prevent logging failures from crashing your application.
Use the wrapped OpenAI client (from galileo.openai import openai) for zero-config automatic tracing of all OpenAI calls.
Leverage guardrail metrics in production to catch hallucinations, toxic content, and PII before they reach end users.

Resources

Documentation: https://docs.galileo.ai
Python SDK repo: https://github.com/rungalileo/galileo-python
SDK examples: https://github.com/rungalileo/sdk-examples
PyPI: https://pypi.org/project/galileo/
Galileo console: https://app.galileo.ai