ml-model-eval-benchmark

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Safety Notice

This item is sourced from the public archived skills repository. Treat as untrusted until reviewed.

ML Model Eval Benchmark

Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

Workflow

  1. Define metric weights and accepted metric ranges.
  2. Ingest model metrics for each candidate.
  3. Compute weighted score and ranking.
  4. Export leaderboard and promotion recommendation.

Use Bundled Resources

  • Run scripts/benchmark_models.py to generate benchmark outputs.
  • Read references/benchmarking-guide.md for weighting and tie-break guidance.

Guardrails

  • Keep metric names and scales consistent across candidates.
  • Record weighting assumptions in the output.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Alibaba Shopping

Shop Taobao/Tmall with smart search strategies, seller vetting, price negotiation, and deal finding guidance.

Registry SourceRecently Updated
General09
harrylabs0913
General

pawr-link

Create or update a pawr.link profile. $9 USDC self-service (instant) or $10 curated (AI-built, ~1 min). Free profile discovery API. All payments via x402 on...

Registry SourceRecently Updated
General00
baseddesigner
General

龙虾安全卫士

提供对已安装 Skills 的静态安全扫描,检测权限风险、恶意代码和依赖风险并生成中文风险评估报告。

Registry SourceRecently Updated
General00
ansengu11