ml-model-eval-benchmark

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ml-model-eval-benchmark" with this command: npx skills add 0x-professor/ml-model-eval-benchmark

ML Model Eval Benchmark

Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

Workflow

  1. Define metric weights and accepted metric ranges.
  2. Ingest model metrics for each candidate.
  3. Compute weighted score and ranking.
  4. Export leaderboard and promotion recommendation.

Use Bundled Resources

  • Run scripts/benchmark_models.py to generate benchmark outputs.
  • Read references/benchmarking-guide.md for weighting and tie-break guidance.

Guardrails

  • Keep metric names and scales consistent across candidates.
  • Record weighting assumptions in the output.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Zoom

Zoom API integration with managed OAuth. Manage meetings, webinars, recordings, and user profiles. Use this skill when users want to schedule meetings, manag...

Registry SourceRecently Updated
General

Kleinanzeigen.de Helper

Erstelle und verwalte Verkaufsanzeigen speziell auf kleinanzeigen.de. Verwende diesen Skill wenn der Human sagt, er will etwas auf kleinanzeigen.de verkaufen...

Registry SourceRecently Updated
General

Poku

Sends and receives phone calls and messages (like SMS, WhatsApp, Slack), and reserves dedicated phone numbers using the Poku API. Example use cases: calling...

Registry SourceRecently Updated
General

IMAP/SMTP Email - Maddy Fix

Read and send email via IMAP/SMTP. Check for new/unread messages, fetch content, search mailboxes, mark as read/unread, and send emails with attachments. Sup...

Registry SourceRecently Updated