observability-patterns

Observability Patterns Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability-patterns" with this command: npx skills add vanman2024/ai-dev-marketplace/vanman2024-ai-dev-marketplace-observability-patterns

Observability Patterns Skill

This skill provides comprehensive templates and configurations for implementing observability in Google ADK agents. Includes logging, tracing, BigQuery analytics, Cloud Trace integration, and third-party observability platforms.

Overview

Google ADK supports multiple observability approaches for monitoring, debugging, and analyzing agent behavior:

  • Cloud Trace - Google Cloud native tracing with OpenTelemetry

  • BigQuery Agent Analytics - Comprehensive event logging and analysis

  • AgentOps - Session replays and unified tracing analytics

  • Phoenix (Arize) - Open-source observability with self-hosted control

  • Weave (W&B) - Weights & Biases platform for tracking and visualization

This skill covers production-ready observability implementations with security and scalability.

Available Scripts

  1. Setup Cloud Trace

Script: scripts/setup-cloud-trace.sh <project-id>

Purpose: Configures Cloud Trace integration for ADK agents

Parameters:

  • project-id
  • Google Cloud project ID (required)

Usage:

Setup Cloud Trace for local development

./scripts/setup-cloud-trace.sh my-project-id

Setup with ADK CLI deployment

adk deploy agent_engine --project=my-project-id --trace_to_cloud ./agent

Environment Variables:

  • GOOGLE_CLOUD_PROJECT

  • Project ID for Cloud Trace

  • GOOGLE_APPLICATION_CREDENTIALS

  • Path to service account key

Output: Cloud Trace enabled, traces visible in console.cloud.google.com

  1. Setup BigQuery Agent Analytics

Script: scripts/setup-bigquery-analytics.sh <project-id> <dataset-id> [bucket-name]

Purpose: Configures BigQuery Agent Analytics plugin for comprehensive event logging

Parameters:

  • project-id

  • Google Cloud project ID (required)

  • dataset-id

  • BigQuery dataset name (required)

  • bucket-name

  • GCS bucket for multimodal content (optional)

Usage:

Setup basic BigQuery analytics

./scripts/setup-bigquery-analytics.sh my-project agent-analytics

Setup with GCS for multimodal content

./scripts/setup-bigquery-analytics.sh my-project agent-analytics my-content-bucket

Create dataset and table

bq mk --dataset my-project:agent-analytics bq mk --table agent-analytics.agent_events_v2 templates/bigquery-schema.json

IAM Requirements:

  • roles/bigquery.jobUser

  • Required for BigQuery operations

  • roles/bigquery.dataEditor

  • Required for writing data

  • roles/storage.objectCreator

  • Required if using GCS offloading

Output: BigQuery table created, events streaming to dataset

  1. Setup AgentOps

Script: scripts/setup-agentops.sh

Purpose: Configures AgentOps integration for session replays and metrics

Usage:

Install AgentOps

pip install -U agentops

Setup with API key

AGENTOPS_API_KEY=your_api_key_here ./scripts/setup-agentops.sh

Verify setup

python -c "import agentops; agentops.init(); print('AgentOps ready')"

Environment Variables:

  • AGENTOPS_API_KEY
  • AgentOps API key from app.agentops.ai/settings/projects

Output: AgentOps initialized, sessions visible in dashboard

  1. Setup Phoenix

Script: scripts/setup-phoenix.sh

Purpose: Configures Phoenix (Arize) integration for open-source observability

Usage:

Install Phoenix packages

pip install openinference-instrumentation-google-adk arize-phoenix-otel

Setup Phoenix with API key

PHOENIX_API_KEY=your_key_here
PHOENIX_COLLECTOR_ENDPOINT=https://app.phoenix.arize.com/s/your-space
./scripts/setup-phoenix.sh

Verify Phoenix connection

python scripts/verify-phoenix.py

Environment Variables:

  • PHOENIX_API_KEY

  • Phoenix API key from phoenix.arize.com

  • PHOENIX_COLLECTOR_ENDPOINT

  • Phoenix collector endpoint URL

Output: Phoenix tracer initialized, traces visible in Phoenix dashboard

  1. Setup Weave

Script: scripts/setup-weave.sh <entity> <project>

Purpose: Configures Weave (W&B) integration for observability

Parameters:

  • entity

  • W&B entity name (visible in Teams sidebar)

  • project

  • W&B project name

Usage:

Install Weave dependencies

pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

Setup Weave with API key

WANDB_API_KEY=your_wandb_key_here ./scripts/setup-weave.sh my-team my-project

Verify Weave connection

python scripts/verify-weave.py

Environment Variables:

  • WANDB_API_KEY
  • W&B API key from wandb.ai/authorize

Output: Weave tracer initialized, traces visible in Weave dashboard

  1. Validate Observability Setup

Script: scripts/validate-observability.sh

Purpose: Validates observability configuration and connectivity

Checks:

  • Cloud Trace connectivity

  • BigQuery dataset and table existence

  • AgentOps initialization

  • Phoenix endpoint reachability

  • Weave endpoint reachability

  • IAM permissions

  • Environment variables set

Usage:

Validate all observability configurations

./scripts/validate-observability.sh

Validate specific tool

./scripts/validate-observability.sh --tool=bigquery ./scripts/validate-observability.sh --tool=cloud-trace ./scripts/validate-observability.sh --tool=agentops

Exit Codes:

  • 0

  • All checks passed

  • 1

  • Configuration missing

  • 2

  • Connectivity failed

  • 3

  • Permission issues

Available Templates

  1. Cloud Trace Configuration

Template: templates/cloud-trace-config.py

Purpose: Cloud Trace integration for ADK agents

Features:

  • OpenTelemetry configuration

  • Automatic span creation for agent runs

  • LLM and tool call tracing

  • Error and latency tracking

Usage:

Enable Cloud Trace via ADK CLI

adk deploy agent_engine --project=$GOOGLE_CLOUD_PROJECT --trace_to_cloud ./agent

Or via Python SDK

from google.adk.app import AdkApp

app = AdkApp( agent=my_agent, enable_tracing=True )

Span Labels:

  • invocation

  • Top-level agent invocation

  • agent_run

  • Individual agent execution

  • call_llm

  • LLM API calls

  • execute_tool

  • Tool executions

  1. BigQuery Analytics Configuration

Template: templates/bigquery-analytics-config.py

Purpose: Complete BigQuery Agent Analytics plugin configuration

Features:

  • Asynchronous event logging

  • Multimodal content with GCS offloading

  • OpenTelemetry-style tracing (trace_id, span_id)

  • Event filtering and batching

  • Custom content formatting

Usage:

from google.adk.plugins.bigquery_agent_analytics_plugin import ( BigQueryAgentAnalyticsPlugin, BigQueryLoggerConfig )

bq_config = BigQueryLoggerConfig( enabled=True, gcs_bucket_name="your-bucket-name", max_content_length=500 * 1024, # 500KB inline limit batch_size=1, # Low latency event_allowlist=["LLM_RESPONSE", "TOOL_COMPLETED"] )

plugin = BigQueryAgentAnalyticsPlugin( project_id="your-project-id", dataset_id="your-dataset-id", config=bq_config )

app = App(root_agent=agent, plugins=[plugin])

Configuration Options:

  • enabled

  • Toggle logging on/off

  • gcs_bucket_name

  • GCS bucket for large content

  • max_content_length

  • Inline text limit (default 500KB)

  • batch_size

  • Events per write (default 1)

  • event_allowlist

  • Whitelist specific event types

  • event_denylist

  • Blacklist specific event types

  • content_formatter

  • Custom formatting function

  1. BigQuery Schema

Template: templates/bigquery-schema.json

Purpose: BigQuery table schema for agent_events_v2

Schema Fields:

  • timestamp

  • Event recording time

  • event_type

  • Event category (LLM_REQUEST, TOOL_STARTING, etc.)

  • content

  • Event-specific JSON payload

  • content_parts

  • Structured multimodal data

  • trace_id

  • OpenTelemetry trace ID

  • span_id

  • OpenTelemetry span ID

  • agent

  • Agent name

  • user_id

  • User identifier

Partitioning: By DATE(timestamp) for cost optimization

Clustering: By event_type, agent, user_id for query performance

  1. AgentOps Configuration

Template: templates/agentops-config.py

Purpose: AgentOps integration for session replays

Features:

  • Minimal two-line integration

  • Hierarchical span visualization

  • LLM call tracking with prompts and completions

  • Token count and latency metrics

  • Cost tracking

Usage:

import agentops

Initialize AgentOps (before ADK imports)

agentops.init()

Your ADK agent code

from google.adk.app import App app = App(root_agent=my_agent)

Span Hierarchy:

  • Agent spans: Named adk.agent.{AgentName}

  • LLM spans: Capture prompts, completions, tokens

  • Tool spans: Record parameters and results

  1. Phoenix Configuration

Template: templates/phoenix-config.py

Purpose: Phoenix (Arize) integration for open-source observability

Features:

  • Self-hosted data control

  • OpenInference instrumentation

  • Trace evaluation

  • Performance debugging

  • Custom evaluators

Usage:

import os from phoenix.otel import register

Set Phoenix credentials

os.environ["PHOENIX_API_KEY"] = "your_api_key_here" os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com/s/your-space"

Register Phoenix tracer

tracer_provider = register( project_name="my-adk-agent", auto_instrument=True )

Your ADK agent code (Phoenix auto-captures traces)

from google.adk.app import App app = App(root_agent=my_agent)

Auto-Instrumentation: Phoenix automatically traces all ADK operations

  1. Weave Configuration

Template: templates/weave-config.py

Purpose: Weave (W&B) integration for observability

Features:

  • Timeline of agent calls

  • Tool invocation tracking

  • Reasoning process analysis

  • Span hierarchy visualization

  • Dashboard integration

Usage:

import os from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter import base64

Setup Weave exporter

wandb_api_key = os.environ["WANDB_API_KEY"] entity = "your-entity" project = "your-project"

auth_string = f"api:{wandb_api_key}" encoded_auth = base64.b64encode(auth_string.encode()).decode()

exporter = OTLPSpanExporter( endpoint="https://trace.wandb.ai/otel/v1/traces", headers={ "Authorization": f"Basic {encoded_auth}", "project_id": f"{entity}/{project}" } )

Configure tracer provider (BEFORE ADK imports)

provider = TracerProvider() provider.add_span_processor(SimpleSpanProcessor(exporter)) trace.set_tracer_provider(provider)

Your ADK agent code

from google.adk.app import App app = App(root_agent=my_agent)

Critical: Set tracer provider before importing ADK components

Available Examples

  1. Complete Observability Setup

Example: examples/complete-observability.md

Covers:

  • Multi-tool observability setup

  • Cloud Trace + BigQuery combination

  • Third-party tool integration

  • Production deployment patterns

  • Cost optimization strategies

Step-by-Step Guide:

  • Enable Cloud Trace for distributed tracing

  • Configure BigQuery for event logging

  • Add AgentOps for session replays

  • Optional: Phoenix or Weave for additional insights

  • Validate all configurations

  • Deploy to production

Production Checklist:

  • Cloud Trace enabled in production

  • BigQuery dataset created with proper IAM

  • GCS bucket configured for multimodal content

  • Event filtering configured to control costs

  • Alert rules defined for error rates

  • Dashboard created for key metrics

  • Retention policies set for cost control

  1. BigQuery Analytics Queries

Example: examples/bigquery-queries.md

Covers:

  • Conversation trace retrieval

  • Token usage analysis

  • Error rate tracking

  • Tool usage statistics

  • Performance metrics

  • Cost analysis

Query Examples:

-- Retrieve conversation traces SELECT timestamp, event_type, JSON_VALUE(content, '$.response') FROM agent_events_v2 WHERE trace_id = 'your-trace-id' ORDER BY timestamp ASC;

-- Token usage by agent SELECT agent, AVG(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as avg_tokens, SUM(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as total_tokens FROM agent_events_v2 WHERE event_type = 'LLM_RESPONSE' GROUP BY agent;

-- Error rate by event type SELECT event_type, COUNT(*) as error_count, DATE(timestamp) as day FROM agent_events_v2 WHERE event_type LIKE '%ERROR%' GROUP BY event_type, day ORDER BY day DESC, error_count DESC;

-- Tool usage frequency SELECT JSON_VALUE(content, '$.tool_name') as tool, COUNT(*) as usage_count FROM agent_events_v2 WHERE event_type = 'TOOL_COMPLETED' GROUP BY tool ORDER BY usage_count DESC;

-- Access multimodal content from GCS SELECT part.mime_type, part.object_ref.uri as gcs_uri FROM agent_events_v2, UNNEST(content_parts) AS part WHERE part.storage_mode = 'GCS_REFERENCE';

  1. Multi-Tool Integration

Example: examples/multi-tool-integration.md

Covers:

  • Using multiple observability tools together

  • Cloud Trace + BigQuery + AgentOps

  • Data correlation across platforms

  • Tool selection criteria

  • Cost vs. insight tradeoffs

Integration Patterns:

Pattern 1: Google Cloud Native

  • Cloud Trace for distributed tracing

  • BigQuery for detailed event analysis

  • Best for: GCP-centric deployments

Pattern 2: Comprehensive Monitoring

  • Cloud Trace for infrastructure tracing

  • AgentOps for session replays

  • BigQuery for analytics

  • Best for: Production monitoring with detailed debugging

Pattern 3: Open Source

  • Phoenix for self-hosted observability

  • BigQuery for long-term storage

  • Best for: Data sovereignty requirements

Pattern 4: ML-Focused

  • Weave for experiment tracking

  • BigQuery for analytics

  • Best for: Research and experimentation

  1. Production Deployment

Example: examples/production-deployment.md

Covers:

  • Production-ready observability configuration

  • IAM role setup

  • Cost optimization

  • Alert configuration

  • Dashboard creation

  • Incident response

Production Setup:

IAM Configuration:

  • Service account with minimal permissions

  • Separate dev/staging/prod credentials

  • Workload Identity for GKE deployments

Cost Controls:

  • Event filtering to reduce BigQuery writes

  • GCS lifecycle policies for multimodal content

  • Table partitioning and clustering

  • Retention policies (30-90 days)

Monitoring:

  • Cloud Monitoring alerts for error rates

  • BigQuery query dashboard in Looker Studio

  • AgentOps session replay for debugging

  • Trace analysis for performance issues

Security:

  • No credentials in code (environment variables only)

  • VPC Service Controls for data protection

  • Customer-managed encryption keys (CMEK)

  • Audit logging for compliance

Security Compliance

CRITICAL: This skill follows strict security rules:

❌ NEVER hardcode:

  • API keys (AgentOps, Phoenix, Weave, W&B)

  • Google Cloud credentials

  • Service account keys

  • OAuth tokens

  • BigQuery connection strings

✅ ALWAYS:

  • Use environment variables for secrets

  • Generate .env.example with placeholders

  • Add .env* to .gitignore

  • Use Google Application Default Credentials

  • Document credential acquisition process

  • Use IAM roles instead of service account keys when possible

Placeholder format:

.env.example

GOOGLE_CLOUD_PROJECT=your-project-id AGENTOPS_API_KEY=your_agentops_key_here PHOENIX_API_KEY=your_phoenix_key_here PHOENIX_COLLECTOR_ENDPOINT=https://app.phoenix.arize.com/s/your-space WANDB_API_KEY=your_wandb_key_here

Progressive Disclosure

This skill provides immediate setup guidance with references to detailed documentation:

  • Quick Start: Use setup scripts for immediate configuration

  • Production: Reference production-deployment.md for complete guide

  • Analytics: Use bigquery-queries.md for query templates

  • Integration: Reference multi-tool-integration.md for advanced patterns

Load additional files only when specific customization is needed.

Common Workflows

  1. Local Development Setup

Enable Cloud Trace for local debugging

export GOOGLE_CLOUD_PROJECT=your-project-id ./scripts/setup-cloud-trace.sh your-project-id

Start agent with tracing

python my_agent.py

View traces at console.cloud.google.com/traces

  1. Production Deployment with BigQuery

1. Create BigQuery dataset

bq mk --dataset my-project:agent-analytics

2. Create events table

bq mk --table agent-analytics.agent_events_v2 templates/bigquery-schema.json

3. Create GCS bucket for multimodal content

gsutil mb gs://my-agent-content/

4. Setup BigQuery analytics

./scripts/setup-bigquery-analytics.sh my-project agent-analytics my-agent-content

5. Deploy agent

adk deploy agent_engine --project=my-project ./agent

6. Validate setup

./scripts/validate-observability.sh --tool=bigquery

  1. Multi-Tool Integration

1. Setup Cloud Trace

export GOOGLE_CLOUD_PROJECT=your-project-id ./scripts/setup-cloud-trace.sh your-project-id

2. Setup BigQuery Analytics

./scripts/setup-bigquery-analytics.sh your-project agent-analytics my-bucket

3. Setup AgentOps

export AGENTOPS_API_KEY=your_key_here ./scripts/setup-agentops.sh

4. Validate all configurations

./scripts/validate-observability.sh

Troubleshooting

Cloud Trace Not Showing Traces

Check:

  • GOOGLE_CLOUD_PROJECT environment variable is set

  • Cloud Trace API is enabled

  • Service account has roles/cloudtrace.agent

  • Tracer initialized before ADK imports

Debug:

Check Cloud Trace API status

gcloud services list --enabled | grep cloudtrace

Enable Cloud Trace API

gcloud services enable cloudtrace.googleapis.com

Test trace export

python scripts/test-cloud-trace.py

BigQuery Events Not Appearing

Check:

  • Dataset and table exist

  • Service account has correct IAM roles

  • BigQuery API is enabled

  • Plugin configuration is correct

  • No event filtering blocking events

Debug:

Check dataset exists

bq ls my-project:

Check table schema

bq show --schema agent-analytics.agent_events_v2

Check IAM permissions

gcloud projects get-iam-policy my-project
--flatten="bindings[].members"
--filter="bindings.members:serviceAccount:YOUR_SA_EMAIL"

Test plugin manually

python scripts/test-bigquery-plugin.py

AgentOps Not Capturing Traces

Check:

  • AgentOps initialized before ADK imports

  • API key is valid

  • Network connectivity to app.agentops.ai

  • AgentOps package version is latest

Fix:

Update AgentOps

pip install -U agentops

Test initialization

python -c "import agentops; agentops.init(); print('Success')"

Check for conflicts with other tracers

Ensure AgentOps is initialized first

Phoenix Connection Failed

Check:

  • Phoenix API key is valid

  • Collector endpoint URL is correct

  • Network access to Phoenix endpoint

  • Required packages installed

Debug:

Test Phoenix endpoint

curl -H "Authorization: Bearer YOUR_KEY"
https://app.phoenix.arize.com/s/YOUR_SPACE

Verify package versions

pip list | grep -E "(openinference|phoenix)"

Run verification script

python scripts/verify-phoenix.py

Weave Traces Not Appearing

Check:

  • Tracer provider set BEFORE ADK imports

  • W&B API key is valid

  • Entity and project names are correct

  • OTEL exporter configured properly

Fix:

Verify initialization order

1. Import OTEL packages

2. Configure and set tracer provider

3. THEN import ADK

Correct order:

from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider trace.set_tracer_provider(TracerProvider()) # FIRST

from google.adk.app import App # THEN

Dependencies

Required:

  • google-adk>=1.21.0

  • ADK framework (version 1.21.0+ for full BigQuery features)

  • google-cloud-trace>=1.13.0

  • Cloud Trace client (optional)

  • google-cloud-bigquery>=3.0.0

  • BigQuery client (optional)

Optional (Third-party tools):

  • agentops>=0.3.0

  • AgentOps integration

  • openinference-instrumentation-google-adk>=0.1.0

  • Phoenix instrumentation

  • arize-phoenix-otel>=0.1.0

  • Phoenix OTEL exporter

  • opentelemetry-sdk>=1.20.0

  • OpenTelemetry SDK for Weave

  • opentelemetry-exporter-otlp-proto-http>=1.20.0

  • OTLP exporter for Weave

Installation:

Core ADK with Cloud Trace

pip install google-adk google-cloud-trace

With BigQuery Analytics

pip install google-adk google-cloud-bigquery

With AgentOps

pip install google-adk agentops

With Phoenix

pip install google-adk openinference-instrumentation-google-adk arize-phoenix-otel

With Weave

pip install google-adk opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

All observability tools

pip install google-adk google-cloud-trace google-cloud-bigquery agentops
openinference-instrumentation-google-adk arize-phoenix-otel
opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

Best Practices

  • Multi-Layer Observability: Use Cloud Trace for infrastructure, BigQuery for analytics, and AgentOps for debugging

  • Cost Control: Implement event filtering and retention policies to manage BigQuery costs

  • Security: Never hardcode credentials; use environment variables and IAM roles

  • Progressive Rollout: Start with Cloud Trace, add BigQuery when analytics needed

  • Tool Selection: Choose tools based on requirements (open-source vs. managed, cost vs. features)

  • Data Correlation: Use trace_id across all tools for unified debugging

  • Alert Configuration: Set up alerts for error rates, latency spikes, and cost anomalies

  • Dashboard Creation: Build custom dashboards in Looker Studio, Grafana, or tool-native UIs

Additional Resources

Tool Comparison

Feature Cloud Trace BigQuery AgentOps Phoenix Weave

Hosting Google Cloud Google Cloud SaaS SaaS/Self-hosted SaaS

Cost Free tier + usage Storage + queries Free tier + paid Free tier + paid Free tier + paid

Setup Complexity Low Medium Very Low Low Medium

Data Control Google Cloud Google Cloud Third-party Self-host option Third-party

Query Flexibility Low Very High Medium High Medium

Real-time Yes Near real-time Yes Yes Yes

Custom Dashboards Limited Full (Looker) Built-in Built-in Built-in

Best For Infrastructure tracing Deep analytics Quick debugging Open-source, control ML experiments

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

document-parsers

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

stt-integration

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

model-routing-patterns

No summary provided by upstream source.

Repository SourceNeeds Review