Train vs Eval Mode
model.train()enables dropout, BatchNorm updates — default after initmodel.eval()disables dropout, uses running stats — MUST call for inference- Mode is sticky — train/eval persists until explicitly changed
model.eval()doesn't disable gradients — still needtorch.no_grad()
Gradient Control
torch.no_grad()for inference — reduces memory, speeds up computationloss.backward()accumulates gradients — calloptimizer.zero_grad()before backwardzero_grad()placement matters — before forward pass, not after backward.detach()to stop gradient flow — prevents memory leak in logging
Device Management
- Model AND data must be on same device —
model.to(device)andtensor.to(device) .cuda()vs.to('cuda')— both work,.to(device)more flexible- CUDA tensors can't convert to numpy directly —
.cpu().numpy()required torch.device('cuda' if torch.cuda.is_available() else 'cpu')— portable code
DataLoader
num_workers > 0uses multiprocessing — Windows needsif __name__ == '__main__':pin_memory=Truewith CUDA — faster transfer to GPU- Workers don't share state — random seeds differ per worker, set in
worker_init_fn - Large
num_workerscan cause memory issues — start with 2-4, increase if CPU-bound
Saving and Loading
torch.save(model.state_dict(), path)— recommended, saves only weights- Loading: create model first, then
model.load_state_dict(torch.load(path)) map_locationfor cross-device —torch.load(path, map_location='cpu')if saved on GPU- Saving whole model pickles code path — breaks if code changes
In-place Operations
- In-place ops end with
_—tensor.add_(1)vstensor.add(1) - In-place on leaf variable breaks autograd — error about modified leaf
- In-place on intermediate can corrupt gradient — avoid in computation graph
tensor.databypasses autograd — legacy, prefer.detach()for safety
Memory Management
- Accumulated tensors leak memory —
.detach()logged metrics torch.cuda.empty_cache()releases cached memory — but doesn't fix leaks- Delete references and call
gc.collect()— before empty_cache if needed with torch.no_grad():prevents graph storage — crucial for validation loop
Common Mistakes
- BatchNorm with
batch_size=1fails in train mode — use eval mode ortrack_running_stats=False - Loss function reduction default is 'mean' — may want 'sum' for gradient accumulation
cross_entropyexpects logits — not softmax output.item()to get Python scalar —.numpy()or[0]deprecated/error