Experiment Tracking
Track ML experiments, metrics, and models.
Comparison
Platform Best For Self-hosted Visualization
MLflow Open-source, model registry Yes Basic
W&B Collaboration, sweeps Limited Excellent
Neptune Team collaboration No Good
ClearML Full MLOps Yes Good
MLflow
Open-source platform from Databricks.
Core components:
-
Tracking: Log parameters, metrics, artifacts
-
Projects: Reproducible runs (MLproject file)
-
Models: Package and deploy models
-
Registry: Model versioning and staging
Strengths: Self-hosted, open-source, model registry, framework integrations Limitations: Basic visualization, less collaborative features
Key concept: Autologging for major frameworks - automatic metric capture with one line.
Weights & Biases (W&B)
Cloud-first experiment tracking with excellent visualization.
Core features:
-
Experiment tracking: Metrics, hyperparameters, system stats
-
Sweeps: Hyperparameter search (grid, random, Bayesian)
-
Artifacts: Dataset and model versioning
-
Reports: Shareable documentation
Strengths: Beautiful visualizations, team collaboration, hyperparameter sweeps Limitations: Cloud-dependent, limited self-hosting
Key concept: wandb.init()
- wandb.log()
- simple API, powerful features.
What to Track
Category Examples
Hyperparameters Learning rate, batch size, architecture
Metrics Loss, accuracy, F1, per-epoch values
Artifacts Model checkpoints, configs, datasets
System GPU usage, memory, runtime
Code Git commit, diff, requirements
Model Registry Concepts
Stage Purpose
None Just logged, not registered
Staging Testing, validation
Production Serving live traffic
Archived Deprecated, kept for reference
Decision Guide
Scenario Recommendation
Self-hosted requirement MLflow
Team collaboration W&B
Model registry focus MLflow
Hyperparameter sweeps W&B
Beautiful dashboards W&B
Full MLOps pipeline MLflow + deployment tools
Resources
-
MLflow: https://mlflow.org/docs/latest/