PyTorch

Avoid common PyTorch mistakes — train/eval mode, gradient leaks, device mismatches, and checkpoint gotchas.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "PyTorch" with this command: npx skills add ivangdavila/pytorch

Train vs Eval Mode

  • model.train() enables dropout, BatchNorm updates — default after init
  • model.eval() disables dropout, uses running stats — MUST call for inference
  • Mode is sticky — train/eval persists until explicitly changed
  • model.eval() doesn't disable gradients — still need torch.no_grad()

Gradient Control

  • torch.no_grad() for inference — reduces memory, speeds up computation
  • loss.backward() accumulates gradients — call optimizer.zero_grad() before backward
  • zero_grad() placement matters — before forward pass, not after backward
  • .detach() to stop gradient flow — prevents memory leak in logging

Device Management

  • Model AND data must be on same device — model.to(device) and tensor.to(device)
  • .cuda() vs .to('cuda') — both work, .to(device) more flexible
  • CUDA tensors can't convert to numpy directly — .cpu().numpy() required
  • torch.device('cuda' if torch.cuda.is_available() else 'cpu') — portable code

DataLoader

  • num_workers > 0 uses multiprocessing — Windows needs if __name__ == '__main__':
  • pin_memory=True with CUDA — faster transfer to GPU
  • Workers don't share state — random seeds differ per worker, set in worker_init_fn
  • Large num_workers can cause memory issues — start with 2-4, increase if CPU-bound

Saving and Loading

  • torch.save(model.state_dict(), path) — recommended, saves only weights
  • Loading: create model first, then model.load_state_dict(torch.load(path))
  • map_location for cross-device — torch.load(path, map_location='cpu') if saved on GPU
  • Saving whole model pickles code path — breaks if code changes

In-place Operations

  • In-place ops end with _tensor.add_(1) vs tensor.add(1)
  • In-place on leaf variable breaks autograd — error about modified leaf
  • In-place on intermediate can corrupt gradient — avoid in computation graph
  • tensor.data bypasses autograd — legacy, prefer .detach() for safety

Memory Management

  • Accumulated tensors leak memory — .detach() logged metrics
  • torch.cuda.empty_cache() releases cached memory — but doesn't fix leaks
  • Delete references and call gc.collect() — before empty_cache if needed
  • with torch.no_grad(): prevents graph storage — crucial for validation loop

Common Mistakes

  • BatchNorm with batch_size=1 fails in train mode — use eval mode or track_running_stats=False
  • Loss function reduction default is 'mean' — may want 'sum' for gradient accumulation
  • cross_entropy expects logits — not softmax output
  • .item() to get Python scalar — .numpy() or [0] deprecated/error

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Dlazy One Click Generation

Short-video generation pipeline. Configure subject, script, TTS voiceover, BGM, and subtitle styling.

Registry SourceRecently Updated
Coding

Dlazy Video Generate

Video generation skill. Automatically selects the best dlazy CLI video model based on the prompt.

Registry SourceRecently Updated
Coding

Dlazy Audio Generate

Audio generation skill. Automatically selects the best dlazy CLI audio/TTS model based on the prompt. 音频生成技能。根据提示词自动选择最佳的 dlazy CLI 音频/TTS 模型。

Registry SourceRecently Updated
Coding

Dlazy Vidu Audio Clone

Clone voice and generate new text reading audio with one click using Vidu Audio Clone.

Registry SourceRecently Updated