querying-mlflow-metrics

Fetches aggregated trace metrics (token usage, latency, trace counts, quality evaluations) from MLflow tracking servers. Triggers on requests to show metrics, analyze token usage, view LLM costs, check usage trends, or query trace statistics.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "querying-mlflow-metrics" with this command: npx skills add b-step62/skills/b-step62-skills-querying-mlflow-metrics

MLflow Metrics

Run scripts/fetch_metrics.py to query metrics from an MLflow tracking server.

Examples

Token usage summary:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG

Output: AVG: 223.91 SUM: 7613

Hourly token trend (last 24h):

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now

Output: Time-bucketed token sums per hour

Latency percentiles by trace:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name

Error rate by status:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status

Quality scores by evaluator (assessments):

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_value -a AVG,P50 -d assessment_name

Output: Average and median scores for each evaluator (e.g., correctness, relevance)

Assessment count by name:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_count -a COUNT -d assessment_name

JSON output: Add -o json to any command.

Arguments

ArgRequiredDescription
-s, --serverYesMLflow server URL
-x, --experiment-idsYesExperiment IDs (comma-separated)
-m, --metricYestrace_count, latency, input_tokens, output_tokens, total_tokens
-a, --aggregationsYesCOUNT, SUM, AVG, MIN, MAX, P50, P95, P99
-d, --dimensionsNoGroup by: trace_name, trace_status
-t, --time-intervalNoBucket size in seconds (3600=hourly, 86400=daily)
--start-timeNo-24h, -7d, now, ISO 8601, or epoch ms
--end-timeNoSame formats as start-time
-o, --outputNotable (default) or json

For SPANS metrics (span_count, latency), add -v SPANS. For ASSESSMENTS metrics, add -v ASSESSMENTS.

See references/api_reference.md for filter syntax and full API details.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

querying-mlflow-metrics

No summary provided by upstream source.

Repository SourceNeeds Review
General

searching-mlflow-docs

No summary provided by upstream source.

Repository SourceNeeds Review
General

instrumenting-with-mlflow-tracing

No summary provided by upstream source.

Repository SourceNeeds Review
General

searching-mlflow-traces

No summary provided by upstream source.

Repository SourceNeeds Review