Notebook ML Architect

Expert guidance for production-quality ML notebooks.

Quick Reference

Operation	Use Case
audit	Analyze notebook for anti-patterns, leakage, reproducibility issues
refactor	Transform notebook into modular Python pipeline
template	Generate new notebook from EDA/classification/experiment template
report	Create markdown summary from executed notebook
convert	Extract Python script from notebook

Audit Workflow

When auditing a notebook:

Read the notebook using the Read tool
Check structure against ml-workflow-guide.md
Detect anti-patterns using anti-patterns.md
Check for data leakage using leakage-checklist.md

Run analysis script if deeper inspection needed:

python scripts/analyze_notebook.py <notebook.ipynb>

Audit Checklist

Execution order: Cells numbered sequentially (no gaps, no out-of-order)
Random seeds: Set early (np.random.seed, torch.manual_seed, random.seed)
Imports at top: All imports in first code cell(s)
No hardcoded paths: Use relative paths or config variables
Train/test split: Clear separation before any modeling
No data leakage: Pre-processing after split, no test data peeking
Modularization: Functions/classes for reusable logic
Dependencies documented: requirements.txt or environment.yml referenced

Severity Levels

CRITICAL: Data leakage, missing train/test split, results unreproducible
HIGH: No seeds, hardcoded paths, execution order issues
MEDIUM: Missing modularization, no dependency docs
LOW: Naming conventions, missing comments, style issues

Refactoring Guide

Transform notebooks into production pipelines:

Step 1: Identify Sections

Look for markdown headers that indicate logical sections:

Data loading
Preprocessing
Feature engineering
Model definition
Training
Evaluation

Step 2: Extract Functions

Convert repeated or complex cell code into functions:

# Before: inline code
df = pd.read_csv('data.csv')
df = df.dropna()
df['feature'] = df['a'] * df['b']

# After: function
def load_and_prepare_data(path: str) -> pd.DataFrame:
    df = pd.read_csv(path)
    df = df.dropna()
    df['feature'] = df['a'] * df['b']
    return df

Step 3: Create Module Structure

project/
├── data.py          # Data loading and preprocessing
├── features.py      # Feature engineering
├── model.py         # Model definition
├── train.py         # Training loop
├── evaluate.py      # Evaluation metrics
├── config.py        # Configuration parameters
└── main.py          # Pipeline entry point

Step 4: Use convert script

python scripts/convert_to_script.py notebook.ipynb output.py --group-by-sections

Template Generation

Generate new notebooks from templates:

Available Templates

EDA Template (assets/templates/eda_template.ipynb)
- Data loading, basic info, missing values, distributions, correlations
Classification Template (assets/templates/classification_template.ipynb)
- Full supervised learning pipeline with evaluation metrics
Experiment Template (assets/templates/experiment_template.ipynb)
- Parameterized notebook for experiment tracking

Using Templates

Copy template to project and customize:

cp ~/.claude/skills/notebook-ml-architect/assets/templates/classification_template.ipynb ./my_experiment.ipynb

Or generate programmatically with modifications.

Reproducibility Checklist

Required Elements

Random Seeds Use the reproducibility header snippet:

# Copy from assets/snippets/reproducibility_header.py

Environment Capture

import sys
print(f"Python: {sys.version}")
for pkg in ['numpy', 'pandas', 'sklearn', 'torch']:
    try:
        mod = __import__(pkg)
        print(f"{pkg}: {mod.__version__}")
    except ImportError:
        pass

Dependency File

pip freeze > requirements.txt
# Or for conda:
conda env export > environment.yml

Data Versioning
- Record data source, download date, preprocessing steps
- Use relative paths from project root
- Consider DVC for large datasets

MCP Tool Usage

Context7 - Library API Lookups

When you need accurate API information:

1. Call resolve-library-id with library name
2. Call get-library-docs with the returned ID and topic

Examples:

sklearn train_test_split parameters
papermill execute_notebook options
nbformat cell structure

Exa Search - Current Best Practices

When you need up-to-date recommendations:

Use web_search_exa for discovery
Use crawling_exa to pull full content from good URLs
Use deep_search_exa for focused queries

Examples:

"PyTorch reproducibility best practices 2024"
"How to handle class imbalance"
"MLflow notebook integration"

GitHub Search - Real-World Patterns

When you need to see how others do it:

searchGitHub with:
- query: specific code pattern
- language: ["Python"]
- path: ".ipynb" for notebooks

Examples:

Production notebook seeding patterns
Evaluation metric implementations
Config management in notebooks

Script Reference

analyze_notebook.py

Parse notebook and extract structure:

python scripts/analyze_notebook.py <notebook.ipynb> [--output json|text]

Output includes:

Cell counts by type
Import statements
Function/class definitions
Detected issues

run_notebook.py

Execute notebook with parameters:

python scripts/run_notebook.py input.ipynb output.ipynb \
  --params '{"learning_rate": 0.01, "epochs": 100}' \
  --timeout 3600

convert_to_script.py

Extract Python from notebook:

python scripts/convert_to_script.py notebook.ipynb output.py \
  --include-markdown \
  --group-by-sections \
  --add-main

Common Issues and Fixes

Data Leakage

Problem: Preprocessing on full dataset before split

# BAD
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Fits on all data
X_train, X_test = train_test_split(X_scaled)

Fix: Split first, fit on train only

# GOOD
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)  # Transform only

Hidden State

Problem: Variables from previous runs affect results

# Cell 1 run multiple times
results.append(model.score(X_test, y_test))  # results grows each run

Fix: Initialize state in cell

results = []  # Always start fresh
results.append(model.score(X_test, y_test))

Missing Seeds

Problem: Different results each run

X_train, X_test = train_test_split(X, y)  # Random each time

Fix: Set seeds explicitly

SEED = 42
X_train, X_test = train_test_split(X, y, random_state=SEED)