torchcode-pytorch-interview-practice

LeetCode-style PyTorch interview practice environment with auto-grading for implementing softmax, attention, GPT-2 and more from scratch.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "torchcode-pytorch-interview-practice" with this command: npx skills add aradotso/trending-skills/aradotso-trending-skills-torchcode-pytorch-interview-practice

TorchCode — PyTorch Interview Practice

Skill by ara.so — Daily 2026 Skills collection.

TorchCode is a Jupyter-based, self-hosted coding practice environment for ML engineers. It provides 40 curated problems covering PyTorch fundamentals and architectures (softmax, LayerNorm, MultiHeadAttention, GPT-2, etc.) with an automated judge that gives instant pass/fail feedback, gradient verification, and timing — like LeetCode but for tensors.


Installation & Setup

Option 1: Online (zero install)

Option 2: pip (for use inside Colab or existing environment)

pip install torch-judge

Option 3: Docker (pre-built image)

docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latest
# Open http://localhost:8888

Option 4: Build locally

git clone https://github.com/duoan/TorchCode.git
cd TorchCode
make run
# Open http://localhost:8888

make run auto-detects Docker or Podman and falls back to local build if the registry image is unavailable (common on Apple Silicon/arm64).


Judge API

The torch_judge package provides the core API used in every notebook.

from torch_judge import check, status, hint, reset_progress

# List all 40 problems and your progress
status()

# Run tests for a specific problem
check("relu")
check("softmax")
check("layernorm")
check("attention")
check("gpt2")

# Get a hint without spoilers
hint("softmax")

# Reset progress for a problem
reset_progress("relu")

check() return values

  • Colored pass/fail per test case
  • Correctness check against PyTorch reference implementation
  • Gradient verification (autograd compatibility)
  • Timing measurement

Problem Set Overview

Difficulty levels: Easy → Medium → Hard

#ProblemKey Concepts
1ReLUActivation functions, element-wise ops
2SoftmaxNumerical stability, exp/log tricks
3Linear Layery = xW^T + b, Kaiming init, nn.Parameter
4LayerNormNormalization, affine transform
5Self-AttentionQKV projections, scaled dot-product
6Multi-Head AttentionHead splitting, concatenation
7BatchNormBatch vs layer statistics, train/eval
8RMSNormLLaMA-style norm
16Cross-Entropy LossLog-softmax, logsumexp trick
17DropoutTrain/eval mode, inverted scaling
18EmbeddingLookup table, weight[indices]
19GELUtorch.erf, Gaussian error linear unit
20Kaiming Initstd = sqrt(2/fan_in)
21Gradient ClippingNorm-based clipping
31Gradient AccumulationMicro-batching, loss scaling
40Linear RegressionNormal equation, GD from scratch

Working Through a Problem

Each problem notebook has the same structure:

templates/
  01_relu.ipynb       # Blank template — your workspace
  02_softmax.ipynb
  ...
solutions/
  01_relu.ipynb       # Reference solution (study after attempt)

Typical notebook workflow

# Cell 1: Import judge
from torch_judge import check, hint
import torch
import torch.nn as nn

# Cell 2: Your implementation
def my_relu(x: torch.Tensor) -> torch.Tensor:
    # TODO: implement ReLU without using torch.relu or F.relu
    raise NotImplementedError

# Cell 3: Run the judge
check("relu")

Real Implementation Examples

ReLU (Problem 1 — Easy)

def my_relu(x: torch.Tensor) -> torch.Tensor:
    return torch.clamp(x, min=0)
    # Alternative: return x * (x > 0)
    # Alternative: return torch.where(x > 0, x, torch.zeros_like(x))

Softmax (Problem 2 — Easy, numerically stable)

def my_softmax(x: torch.Tensor, dim: int = -1) -> torch.Tensor:
    # Subtract max for numerical stability (prevents overflow)
    x_max = x.max(dim=dim, keepdim=True).values
    x_shifted = x - x_max
    exp_x = torch.exp(x_shifted)
    return exp_x / exp_x.sum(dim=dim, keepdim=True)

LayerNorm (Problem 4 — Medium)

def my_layer_norm(
    x: torch.Tensor,
    weight: torch.Tensor,   # gamma (scale)
    bias: torch.Tensor,     # beta (shift)
    eps: float = 1e-5
) -> torch.Tensor:
    mean = x.mean(dim=-1, keepdim=True)
    var = x.var(dim=-1, keepdim=True, unbiased=False)
    x_norm = (x - mean) / torch.sqrt(var + eps)
    return weight * x_norm + bias

RMSNorm (Problem 8 — Medium, LLaMA-style)

def rms_norm(x: torch.Tensor, weight: torch.Tensor, eps: float = 1e-6) -> torch.Tensor:
    rms = torch.sqrt((x ** 2).mean(dim=-1, keepdim=True) + eps)
    return (x / rms) * weight

Scaled Dot-Product Self-Attention (Problem 5 — Medium)

import torch.nn.functional as F
import math

def scaled_dot_product_attention(
    Q: torch.Tensor,  # (B, heads, T, head_dim)
    K: torch.Tensor,
    V: torch.Tensor,
    mask: torch.Tensor = None
) -> torch.Tensor:
    d_k = Q.size(-1)
    scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)
    if mask is not None:
        scores = scores.masked_fill(mask == 0, float('-inf'))
    attn_weights = F.softmax(scores, dim=-1)
    return torch.matmul(attn_weights, V)

Multi-Head Attention (Problem 6 — Medium)

class MyMultiHeadAttention(nn.Module):
    def __init__(self, d_model: int, num_heads: int):
        super().__init__()
        assert d_model % num_heads == 0
        self.num_heads = num_heads
        self.head_dim = d_model // num_heads
        self.d_model = d_model

        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)

    def forward(self, x: torch.Tensor, mask: torch.Tensor = None) -> torch.Tensor:
        B, T, C = x.shape

        def split_heads(t):
            return t.view(B, T, self.num_heads, self.head_dim).transpose(1, 2)

        Q = split_heads(self.W_q(x))
        K = split_heads(self.W_k(x))
        V = split_heads(self.W_v(x))

        attn_out = scaled_dot_product_attention(Q, K, V, mask)
        # (B, heads, T, head_dim) -> (B, T, d_model)
        attn_out = attn_out.transpose(1, 2).contiguous().view(B, T, C)
        return self.W_o(attn_out)

Cross-Entropy Loss (Problem 16 — Easy)

def cross_entropy_loss(logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
    # logits: (B, C), targets: (B,) with class indices
    # Use logsumexp trick for numerical stability
    log_sum_exp = torch.logsumexp(logits, dim=-1)  # (B,)
    log_probs = logits[torch.arange(len(targets)), targets]  # (B,)
    return (log_sum_exp - log_probs).mean()

Dropout (Problem 17 — Easy)

class MyDropout(nn.Module):
    def __init__(self, p: float = 0.5):
        super().__init__()
        self.p = p

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if not self.training or self.p == 0:
            return x
        mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p))
        return x * mask / (1 - self.p)  # inverted scaling

Kaiming Init (Problem 20 — Easy)

def kaiming_init(weight: torch.Tensor) -> torch.Tensor:
    fan_in = weight.size(1)
    std = math.sqrt(2.0 / fan_in)
    with torch.no_grad():
        weight.normal_(0, std)
    return weight

Gradient Clipping (Problem 21 — Easy)

def clip_grad_norm(parameters, max_norm: float) -> float:
    params = [p for p in parameters if p.grad is not None]
    total_norm = torch.sqrt(sum(p.grad.data.norm() ** 2 for p in params))
    clip_coef = max_norm / (total_norm + 1e-6)
    if clip_coef < 1:
        for p in params:
            p.grad.data.mul_(clip_coef)
    return total_norm.item()

Gradient Accumulation (Problem 31 — Easy)

def train_with_accumulation(model, optimizer, dataloader, accumulation_steps=4):
    optimizer.zero_grad()
    for i, (inputs, targets) in enumerate(dataloader):
        outputs = model(inputs)
        loss = criterion(outputs, targets) / accumulation_steps  # scale loss
        loss.backward()

        if (i + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()

Common Patterns & Tips

Numerical stability pattern

Always subtract the max before exp():

# WRONG — can overflow for large values
exp_x = torch.exp(x)

# CORRECT — numerically stable
exp_x = torch.exp(x - x.max(dim=-1, keepdim=True).values)

Causal attention mask (for GPT-style models)

def causal_mask(T: int, device) -> torch.Tensor:
    return torch.tril(torch.ones(T, T, device=device)).unsqueeze(0).unsqueeze(0)

nn.Module skeleton (used in many problems)

class MyLayer(nn.Module):
    def __init__(self, ...):
        super().__init__()
        self.weight = nn.Parameter(torch.empty(...))
        self.bias = nn.Parameter(torch.zeros(...))
        self._init_weights()

    def _init_weights(self):
        nn.init.kaiming_uniform_(self.weight)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        ...

Train vs eval mode pattern

def forward(self, x):
    if self.training:
        # use batch statistics
        mean = x.mean(dim=0)
        var = x.var(dim=0, unbiased=False)
        # update running stats
        self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean
        self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var
    else:
        # use running statistics
        mean = self.running_mean
        var = self.running_var
    return (x - mean) / torch.sqrt(var + self.eps) * self.weight + self.bias

Project Structure

TorchCode/
├── templates/          # Blank notebooks for each problem (your workspace)
│   ├── 01_relu.ipynb
│   ├── 02_softmax.ipynb
│   └── ...
├── solutions/          # Reference solutions (study after attempting)
│   └── ...
├── torch_judge/        # Auto-grading package
│   ├── __init__.py     # check(), status(), hint(), reset_progress()
│   └── tasks/          # Per-problem test cases
├── Dockerfile
├── Makefile
└── pyproject.toml      # torch-judge package definition

Troubleshooting

Docker image not available for Apple Silicon (arm64)

# make run auto-falls back to local build, or force it:
make build
make start

check() not found in Colab

!pip install torch-judge
# then restart runtime

Notebook reset to blank template

Use the toolbar "Reset" button in JupyterLab to reset any notebook to its original blank state — useful for re-practicing a problem.

Gradient check fails but output is correct

Ensure your implementation uses PyTorch operations (not NumPy) so autograd works:

# WRONG — breaks autograd
import numpy as np
result = np.exp(x.numpy())

# CORRECT — autograd compatible
result = torch.exp(x)

Viewing reference solution

After attempting a problem, open the matching file in solutions/:

solutions/02_softmax.ipynb

Key Concepts Tested

ConceptProblems
Numerical stabilitySoftmax, Cross-Entropy, LogSumExp
Autograd / nn.ParameterLinear, LayerNorm, all nn.Module problems
Train vs eval behaviorBatchNorm, Dropout
BroadcastingLayerNorm, RMSNorm, attention masking
Shape manipulationMulti-Head Attention (view, transpose, contiguous)
Weight initializationKaiming Init, Linear Layer
Memory-efficient trainingGradient Accumulation, Gradient Clipping

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

everything-claude-code-harness

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

paperclip-ai-orchestration

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

freecodecamp-curriculum

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

opencli-web-automation

No summary provided by upstream source.

Repository SourceNeeds Review