Weights & Biases: ML Experiment Tracking & MLOps
When to Use This Skill
Use Weights & Biases (W&B) when you need to:
-
Track ML experiments with automatic metric logging
-
Visualize training in real-time dashboards
-
Compare runs across hyperparameters and configurations
-
Optimize hyperparameters with automated sweeps
-
Manage model registry with versioning and lineage
-
Collaborate on ML projects with team workspaces
-
Track artifacts (datasets, models, code) with lineage
Users: 200,000+ ML practitioners | GitHub Stars: 10.5k+ | Integrations: 100+
Installation
Install W&B
pip install wandb
Login (creates API key)
wandb login
Or set API key programmatically
export WANDB_API_KEY=your_api_key_here
Quick Start
Basic Experiment Tracking
import wandb
Initialize a run
run = wandb.init( project="my-project", config={ "learning_rate": 0.001, "epochs": 10, "batch_size": 32, "architecture": "ResNet50" } )
Training loop
for epoch in range(run.config.epochs): # Your training code train_loss = train_epoch() val_loss = validate()
# Log metrics
wandb.log({
"epoch": epoch,
"train/loss": train_loss,
"val/loss": val_loss,
"train/accuracy": train_acc,
"val/accuracy": val_acc
})
Finish the run
wandb.finish()
With PyTorch
import torch import wandb
Initialize
wandb.init(project="pytorch-demo", config={ "lr": 0.001, "epochs": 10 })
Access config
config = wandb.config
Training loop
for epoch in range(config.epochs): for batch_idx, (data, target) in enumerate(train_loader): # Forward pass output = model(data) loss = criterion(output, target)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Log every 100 batches
if batch_idx % 100 == 0:
wandb.log({
"loss": loss.item(),
"epoch": epoch,
"batch": batch_idx
})
Save model
torch.save(model.state_dict(), "model.pth") wandb.save("model.pth") # Upload to W&B
wandb.finish()
Core Concepts
- Projects and Runs
Project: Collection of related experiments Run: Single execution of your training script
Create/use project
run = wandb.init( project="image-classification", name="resnet50-experiment-1", # Optional run name tags=["baseline", "resnet"], # Organize with tags notes="First baseline run" # Add notes )
Each run has unique ID
print(f"Run ID: {run.id}") print(f"Run URL: {run.url}")
- Configuration Tracking
Track hyperparameters automatically:
config = { # Model architecture "model": "ResNet50", "pretrained": True,
# Training params
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 50,
"optimizer": "Adam",
# Data params
"dataset": "ImageNet",
"augmentation": "standard"
}
wandb.init(project="my-project", config=config)
Access config during training
lr = wandb.config.learning_rate batch_size = wandb.config.batch_size
- Metric Logging
Log scalars
wandb.log({"loss": 0.5, "accuracy": 0.92})
Log multiple metrics
wandb.log({ "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, "learning_rate": current_lr, "epoch": epoch })
Log with custom x-axis
wandb.log({"loss": loss}, step=global_step)
Log media (images, audio, video)
wandb.log({"examples": [wandb.Image(img) for img in images]})
Log histograms
wandb.log({"gradients": wandb.Histogram(gradients)})
Log tables
table = wandb.Table(columns=["id", "prediction", "ground_truth"]) wandb.log({"predictions": table})
- Model Checkpointing
import torch import wandb
Save model checkpoint
checkpoint = { 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss, }
torch.save(checkpoint, 'checkpoint.pth')
Upload to W&B
wandb.save('checkpoint.pth')
Or use Artifacts (recommended)
artifact = wandb.Artifact('model', type='model') artifact.add_file('checkpoint.pth') wandb.log_artifact(artifact)
Hyperparameter Sweeps
Automatically search for optimal hyperparameters.
Define Sweep Configuration
sweep_config = { 'method': 'bayes', # or 'grid', 'random' 'metric': { 'name': 'val/accuracy', 'goal': 'maximize' }, 'parameters': { 'learning_rate': { 'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1 }, 'batch_size': { 'values': [16, 32, 64, 128] }, 'optimizer': { 'values': ['adam', 'sgd', 'rmsprop'] }, 'dropout': { 'distribution': 'uniform', 'min': 0.1, 'max': 0.5 } } }
Initialize sweep
sweep_id = wandb.sweep(sweep_config, project="my-project")
Define Training Function
def train(): # Initialize run run = wandb.init()
# Access sweep parameters
lr = wandb.config.learning_rate
batch_size = wandb.config.batch_size
optimizer_name = wandb.config.optimizer
# Build model with sweep config
model = build_model(wandb.config)
optimizer = get_optimizer(optimizer_name, lr)
# Training loop
for epoch in range(NUM_EPOCHS):
train_loss = train_epoch(model, optimizer, batch_size)
val_acc = validate(model)
# Log metrics
wandb.log({
"train/loss": train_loss,
"val/accuracy": val_acc
})
Run sweep
wandb.agent(sweep_id, function=train, count=50) # Run 50 trials
Sweep Strategies
Grid search - exhaustive
sweep_config = { 'method': 'grid', 'parameters': { 'lr': {'values': [0.001, 0.01, 0.1]}, 'batch_size': {'values': [16, 32, 64]} } }
Random search
sweep_config = { 'method': 'random', 'parameters': { 'lr': {'distribution': 'uniform', 'min': 0.0001, 'max': 0.1}, 'dropout': {'distribution': 'uniform', 'min': 0.1, 'max': 0.5} } }
Bayesian optimization (recommended)
sweep_config = { 'method': 'bayes', 'metric': {'name': 'val/loss', 'goal': 'minimize'}, 'parameters': { 'lr': {'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1} } }
Artifacts
Track datasets, models, and other files with lineage.
Log Artifacts
Create artifact
artifact = wandb.Artifact( name='training-dataset', type='dataset', description='ImageNet training split', metadata={'size': '1.2M images', 'split': 'train'} )
Add files
artifact.add_file('data/train.csv') artifact.add_dir('data/images/')
Log artifact
wandb.log_artifact(artifact)
Use Artifacts
Download and use artifact
run = wandb.init(project="my-project")
Download artifact
artifact = run.use_artifact('training-dataset:latest') artifact_dir = artifact.download()
Use the data
data = load_data(f"{artifact_dir}/train.csv")
Model Registry
Log model as artifact
model_artifact = wandb.Artifact( name='resnet50-model', type='model', metadata={'architecture': 'ResNet50', 'accuracy': 0.95} )
model_artifact.add_file('model.pth') wandb.log_artifact(model_artifact, aliases=['best', 'production'])
Link to model registry
run.link_artifact(model_artifact, 'model-registry/production-models')
Integration Examples
HuggingFace Transformers
from transformers import Trainer, TrainingArguments import wandb
Initialize W&B
wandb.init(project="hf-transformers")
Training arguments with W&B
training_args = TrainingArguments( output_dir="./results", report_to="wandb", # Enable W&B logging run_name="bert-finetuning", logging_steps=100, save_steps=500 )
Trainer automatically logs to W&B
trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset )
trainer.train()
PyTorch Lightning
from pytorch_lightning import Trainer from pytorch_lightning.loggers import WandbLogger import wandb
Create W&B logger
wandb_logger = WandbLogger( project="lightning-demo", log_model=True # Log model checkpoints )
Use with Trainer
trainer = Trainer( logger=wandb_logger, max_epochs=10 )
trainer.fit(model, datamodule=dm)
Keras/TensorFlow
import wandb from wandb.keras import WandbCallback
Initialize
wandb.init(project="keras-demo")
Add callback
model.fit( x_train, y_train, validation_data=(x_val, y_val), epochs=10, callbacks=[WandbCallback()] # Auto-logs metrics )
Visualization & Analysis
Custom Charts
Log custom visualizations
import matplotlib.pyplot as plt
fig, ax = plt.subplots() ax.plot(x, y) wandb.log({"custom_plot": wandb.Image(fig)})
Log confusion matrix
wandb.log({"conf_mat": wandb.plot.confusion_matrix( probs=None, y_true=ground_truth, preds=predictions, class_names=class_names )})
Reports
Create shareable reports in W&B UI:
-
Combine runs, charts, and text
-
Markdown support
-
Embeddable visualizations
-
Team collaboration
Best Practices
- Organize with Tags and Groups
wandb.init( project="my-project", tags=["baseline", "resnet50", "imagenet"], group="resnet-experiments", # Group related runs job_type="train" # Type of job )
- Log Everything Relevant
Log system metrics
wandb.log({ "gpu/util": gpu_utilization, "gpu/memory": gpu_memory_used, "cpu/util": cpu_utilization })
Log code version
wandb.log({"git_commit": git_commit_hash})
Log data splits
wandb.log({ "data/train_size": len(train_dataset), "data/val_size": len(val_dataset) })
- Use Descriptive Names
✅ Good: Descriptive run names
wandb.init( project="nlp-classification", name="bert-base-lr0.001-bs32-epoch10" )
❌ Bad: Generic names
wandb.init(project="nlp", name="run1")
- Save Important Artifacts
Save final model
artifact = wandb.Artifact('final-model', type='model') artifact.add_file('model.pth') wandb.log_artifact(artifact)
Save predictions for analysis
predictions_table = wandb.Table( columns=["id", "input", "prediction", "ground_truth"], data=predictions_data ) wandb.log({"predictions": predictions_table})
- Use Offline Mode for Unstable Connections
import os
Enable offline mode
os.environ["WANDB_MODE"] = "offline"
wandb.init(project="my-project")
... your code ...
Sync later
wandb sync <run_directory>
Team Collaboration
Share Runs
Runs are automatically shareable via URL
run = wandb.init(project="team-project") print(f"Share this URL: {run.url}")
Team Projects
-
Create team account at wandb.ai
-
Add team members
-
Set project visibility (private/public)
-
Use team-level artifacts and model registry
Pricing
-
Free: Unlimited public projects, 100GB storage
-
Academic: Free for students/researchers
-
Teams: $50/seat/month, private projects, unlimited storage
-
Enterprise: Custom pricing, on-prem options
Resources
-
Documentation: https://docs.wandb.ai
-
GitHub: https://github.com/wandb/wandb (10.5k+ stars)
-
Examples: https://github.com/wandb/examples
-
Community: https://wandb.ai/community
-
Discord: https://wandb.me/discord
See Also
-
references/sweeps.md
-
Comprehensive hyperparameter optimization guide
-
references/artifacts.md
-
Data and model versioning patterns
-
references/integrations.md
-
Framework-specific examples