opik-optimizer

Optimize LLM prompts, tools, and agents in Opik using standardized optimizer workflows (prompt optimization, tool optimization, and parameter tuning), dataset/metric wiring, and result interpretation.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "opik-optimizer" with this command: npx skills add vincentkoc/dotskills/vincentkoc-dotskills-opik-optimizer

Opik Optimizer

Purpose

Design, run, and interpret Opik Optimizer workflows for prompts, tools, and model parameters with consistent dataset/metric wiring and reproducible evaluation.

When to use

Use this skill when a user asks for:

  • Choosing and configuring Opik Optimizer algorithms for prompt/agent optimization.
  • Writing ChatPrompt-based optimization runs and custom metric functions.
  • Optimizing with tools (function calling or MCP), selected prompt roles, or prompt segments.
  • Tuning LLM call parameters with optimize_parameter.
  • Comparing optimizer outputs and interpreting OptimizationResult.

Workflow

  1. Select optimizer strategy (MetaPromptOptimizer, FewShotBayesianOptimizer, HRPO, etc.) based on the target optimization goal.
  2. Build prompt/dataset/metric wiring and validate placeholder-field alignment.
  3. Run prompt, tool, or parameter optimization with explicit controls (n_threads, n_samples, max_trials, seed).
  4. Inspect OptimizationResult and compare score deltas against initial baselines.
  5. Summarize recommendations, risks, and next experiments.

Inputs

  • Target optimization objective (prompt/tool/parameter) and success metric.
  • Dataset source and expected schema fields.
  • Model/provider constraints and runtime limits.
  • Optional scope constraints (optimize_prompts segments, tool fields, project names).

Outputs

  • Optimizer run configuration and rationale.
  • Result interpretation (score, initial_score, history trends).
  • Recommended next changes and follow-up experiment plan.

Use the reference files in this skill for details before implementing code:

  • references/algorithms.md
  • references/prompt_agent_workflow.md
  • references/example_patterns.md

Opik Optimizer quickstart

  1. Install and import:
pip install opik-optimizer
from opik_optimizer import ChatPrompt, MetaPromptOptimizer, HRPO, FewShotBayesianOptimizer
from opik_optimizer import datasets
  1. Build a prompt and metric:
from opik.evaluation.metrics import LevenshteinRatio

prompt = ChatPrompt(
    system="You are a concise answerer.",
    user="{question}",
)

def metric(dataset_item: dict, output: str) -> float:
    return LevenshteinRatio().score(
        reference=dataset_item["answer"],
        output=output,
    ).value
  1. Load dataset and run:
dataset = datasets.hotpot(count=30)

result = MetaPromptOptimizer(model="openai/gpt-5-nano").optimize_prompt(
    prompt=prompt,
    dataset=dataset,
    metric=metric,
    n_samples=20,
    max_trials=10,
)
result.display()

Core workflow you should follow

  1. Pick optimizer class:
    • Few-shot examples + Bayesian selection: FewShotBayesianOptimizer
    • LLM meta-reasoning: MetaPromptOptimizer
    • Genetic + MOO / LLM crossover: EvolutionaryOptimizer
    • Hierarchical reflective diagnostics: HierarchicalReflectiveOptimizer (HRPO)
    • Pareto-based genetic strategy: GepaOptimizer
    • Parameter tuning only: ParameterOptimizer
  2. Define a single ChatPrompt (or dict of prompts for multi-prompt cases).
  3. Provide a dataset from opik_optimizer.datasets.
  4. Provide metric callable with signature (dataset_item, llm_output) -> float (or ScoreResult/list of ScoreResult).
  5. Set optimizer controls (n_threads, n_samples, max_trials, seed, etc.).
  6. Run one of:
    • optimize_prompt(...) for prompt/system behavior changes.
    • optimize_parameter(...) for model-call hyperparameters.
  7. Inspect OptimizationResult (score, initial_score, history, optimization_id, get_optimized_parameters).

Key execution details to enforce

  • Prefer explicit project_name for Opik tracking if you are using org-level observability.
  • Keep placeholders in prompts aligned with dataset fields (for example {question}).
  • Start with optimize_prompts="system" or "user" when scope should be constrained.
  • Keep model names in MetaPrompt/reasoning calls provider-compatible for your account.
  • Validate multimodal input payloads by preserving non-empty content segments only.
  • For small datasets, use n_samples and n_samples_strategy carefully; over-allocation auto-falls back to full set.

Tooling and segment-based control

  • Tools can be optimized with MCP/function schema fields, not only by changing prompt wording.
  • For fine-grained text updates, use optimize_prompts values and helper functions from prompt_segments:
    • extract_prompt_segments(ChatPrompt) to inspect stable segment IDs.
    • apply_segment_updates(ChatPrompt, updates) for deterministic edits.
  • Tool optimization is distinct from prompt optimization.

Runnable examples live upstream in the Opik repo:

If you need local runnable scripts, vendor the upstream examples into a scripts/ folder and keep references one level deep.

Common mistakes to avoid

  • Passing empty dataset or mismatched placeholder names.
  • Mixing deprecated constructor arg num_threads with n_threads.
  • Assuming tool optimization is the same as agent function-calling optimization.
  • Running ParameterOptimizer.optimize_prompt (it raises and should not be used).

Next actions

  • For in-depth behavior and per-class parameter tables: references/algorithms.md
  • For exact optimize_prompt signatures, prompts, tool constraints, and result usage: references/prompt_agent_workflow.md
  • For pattern examples and source-backed workflows: references/example_patterns.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

technical-documentation

No summary provided by upstream source.

Repository SourceNeeds Review
General

technical-integrations

No summary provided by upstream source.

Repository SourceNeeds Review
General

technical-skill-finder

No summary provided by upstream source.

Repository SourceNeeds Review