dspy-optimize-anything

Universal text artifact optimizer using GEPA's optimize_anything API for code, prompts, agent architectures, configs, and more

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "dspy-optimize-anything" with this command: npx skills add omidzamani/dspy-skills/omidzamani-dspy-skills-dspy-optimize-anything

GEPA optimize_anything

Goal

Optimize any artifact representable as text — code, prompts, agent architectures, vector graphics, configurations — using a single declarative API powered by GEPA's reflective evolutionary search.

When to Use

  • Beyond prompt optimization — optimizing code, configs, SVGs, scheduling policies, etc.
  • Single hard problems — circle packing, kernel generation, algorithm discovery
  • Batch related problems — CUDA kernels, code generation tasks with cross-transfer
  • Generalization — agent skills, policies, or prompts that must transfer to unseen inputs
  • When you can express quality as a score and provide diagnostic feedback (ASI)

Inputs

InputTypeDescription
seed_candidatestr | dict[str, str] | NoneStarting artifact text, or None for seedless mode
evaluatorCallableReturns score (higher=better), optionally with ASI dict
datasetlist | NoneTraining examples (for multi-task and generalization modes)
valsetlist | NoneValidation set (for generalization mode)
objectivestr | NoneNatural language description of what to optimize for
backgroundstr | NoneDomain knowledge and constraints
configGEPAConfig | NoneEngine, reflection, and tracking settings

Outputs

OutputTypeDescription
result.best_candidatestr | dictBest optimized artifact

Workflow

Phase 1: Install

pip install gepa

Phase 2: Define Evaluator with ASI

The evaluator scores a candidate and returns Actionable Side Information (ASI) — diagnostic feedback that guides the LLM proposer during reflection.

Simple evaluator (score only):

import gepa.optimize_anything as oa

def evaluate(candidate: str) -> float:
    score, diagnostic = run_my_system(candidate)
    oa.log(f"Error: {diagnostic}")  # captured as ASI
    return score

Rich evaluator (score + structured ASI):

def evaluate(candidate: str) -> tuple[float, dict]:
    result = execute_code(candidate)
    return result.score, {
        "Error": result.stderr,
        "Output": result.stdout,
        "Runtime": f"{result.time_ms:.1f}ms",
    }

ASI can include open-ended text, structured data, multi-objectives (via scores), or images (via gepa.Image) for vision-capable LLMs.

Phase 3: Choose Optimization Mode

Mode 1 — Single-Task Search: Solve one hard problem. No dataset needed.

result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
)

Mode 2 — Multi-Task Search: Solve a batch of related problems with cross-transfer.

result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    dataset=tasks,
)

Mode 3 — Generalization: Build a skill/prompt/policy that transfers to unseen problems.

result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    dataset=train,
    valset=val,
)

Seedless mode: Describe what you need instead of providing a seed.

result = oa.optimize_anything(
    evaluator=evaluate,
    objective="Generate a Python function `reverse()` that reverses a string.",
)

Phase 4: Use Results

print(result.best_candidate)

Production Example

import gepa.optimize_anything as oa
from gepa import Image
import logging

logger = logging.getLogger(__name__)

# ---------- SVG optimization with VLM feedback ----------

GOAL = "a pelican riding a bicycle"
VLM = "vertex_ai/gemini-3-flash-preview"

VISUAL_ASPECTS = [
    {"id": "overall",     "criteria": f"Rate overall quality of this SVG ({GOAL}). SCORE: X/10"},
    {"id": "anatomy",     "criteria": "Rate pelican accuracy: beak, pouch, plumage. SCORE: X/10"},
    {"id": "bicycle",     "criteria": "Rate bicycle: wheels, frame, handlebars, pedals. SCORE: X/10"},
    {"id": "composition", "criteria": "Rate how convincingly the pelican rides the bicycle. SCORE: X/10"},
]

def evaluate(candidate, example):
    """Render SVG, score with a VLM, return (score, ASI)."""
    image = render_image(candidate["svg_code"])  # via cairosvg
    score, feedback = get_vlm_score_feedback(VLM, image, example["criteria"])

    return score, {
        "RenderedSVG": Image(base64_data=image, media_type="image/png"),
        "Feedback": feedback,
    }

result = oa.optimize_anything(
    seed_candidate={"svg_code": "<svg>...</svg>"},
    evaluator=evaluate,
    dataset=VISUAL_ASPECTS,
    background=f"Optimize SVG source code depicting '{GOAL}'. "
               "Improve anatomy, composition, and visual quality.",
)

logger.info(f"Best SVG:\n{result.best_candidate['svg_code']}")


# ---------- Code optimization (single-task) ----------

def evaluate_solver(candidate: str) -> tuple[float, dict]:
    """Evaluate a Python solver for a mathematical optimization problem."""
    import subprocess, json

    proc = subprocess.run(
        ["python", "-c", candidate],
        capture_output=True, text=True, timeout=30,
    )

    if proc.returncode != 0:
        oa.log(f"Runtime error: {proc.stderr}")
        return 0.0, {"Error": proc.stderr}

    try:
        output = json.loads(proc.stdout)
        return output["score"], {
            "Output": output.get("solution"),
            "Runtime": f"{output.get('time_ms', 0):.1f}ms",
        }
    except (json.JSONDecodeError, KeyError) as e:
        oa.log(f"Parse error: {e}")
        return 0.0, {"Error": str(e), "Stdout": proc.stdout}

result = oa.optimize_anything(
    evaluator=evaluate_solver,
    objective="Write a Python solver for the bin packing problem that "
              "minimizes the number of bins. Output JSON with 'score' and 'solution'.",
    background="Use first-fit-decreasing as a starting heuristic. "
               "Higher score = fewer bins used.",
)

print(result.best_candidate)


# ---------- Agent architecture generalization ----------

def evaluate_agent(candidate: str, example: dict) -> tuple[float, dict]:
    """Run an agent architecture on a task and score it."""
    exec_globals = {}
    exec(candidate, exec_globals)
    agent_fn = exec_globals.get("solve")

    if agent_fn is None:
        return 0.0, {"Error": "No `solve` function defined"}

    try:
        prediction = agent_fn(example["input"])
        correct = prediction == example["expected"]
        score = 1.0 if correct else 0.0
        feedback = "Correct" if correct else (
            f"Expected '{example['expected']}', got '{prediction}'"
        )
        return score, {"Prediction": prediction, "Feedback": feedback}
    except Exception as e:
        return 0.0, {"Error": str(e)}

result = oa.optimize_anything(
    seed_candidate="def solve(input):\n    return input",
    evaluator=evaluate_agent,
    dataset=train_tasks,
    valset=val_tasks,
    background="Discover a Python agent function `solve(input)` that "
               "generalizes across unseen reasoning tasks.",
)

print(result.best_candidate)

Integration with DSPy

optimize_anything complements DSPy's built-in optimizers. Use DSPy optimizers (GEPA, MIPROv2, BootstrapFewShot) for DSPy programs, and optimize_anything for arbitrary text artifacts outside DSPy:

import dspy
import gepa.optimize_anything as oa

# DSPy program optimization (use dspy.GEPA)
optimizer = dspy.GEPA(
    metric=gepa_metric,
    reflection_lm=dspy.LM("openai/gpt-4o"),
    auto="medium",
)
compiled = optimizer.compile(agent, trainset=trainset)

# Non-DSPy artifact optimization (use optimize_anything)
result = oa.optimize_anything(
    seed_candidate=my_config_yaml,
    evaluator=eval_config,
    background="Optimize Kubernetes scheduling policy for cost.",
)

Best Practices

  1. Rich ASI — The more diagnostic feedback you provide, the better the proposer can reason about improvements
  2. Use oa.log() — Route prints to the proposer as ASI instead of stdout
  3. Structured returns — Return (score, dict) tuples for multi-faceted diagnostics
  4. Seedless for exploration — Use objective= when the solution space is large and unfamiliar
  5. Background context — Provide domain knowledge via background= to constrain the search
  6. Generalization mode — Always provide valset when the artifact must transfer to unseen inputs
  7. Images as ASI — Use gepa.Image to pass rendered outputs to vision-capable LLMs

Limitations

  • Requires the gepa package (pip install gepa)
  • Evaluator must be deterministic or low-variance for stable optimization
  • Compute cost scales with number of candidates explored
  • Single-task mode does not generalize; use mode 3 with valset for transfer
  • Currently powered by GEPA backend; API is backend-agnostic for future strategies

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

dspy-evaluation-suite

No summary provided by upstream source.

Repository SourceNeeds Review
General

dspy-rag-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
General

dspy-bootstrap-fewshot

No summary provided by upstream source.

Repository SourceNeeds Review
General

dspy-debugging-observability

No summary provided by upstream source.

Repository SourceNeeds Review