AutoML Optimizer

Overview

Automates the tedious process of hyperparameter tuning and model selection. Instead of manually trying different configurations, define a search space and let AutoML find the optimal configuration through intelligent exploration.

Why AutoML?

Manual Tuning Problems:

Time-consuming (hours/days of trial and error)
Subjective (depends on intuition)
Incomplete (can't try all combinations)
Not reproducible (hard to document search process)

AutoML Benefits:

✅ Systematic exploration of search space
✅ Intelligent sampling (Bayesian optimization)
✅ All experiments tracked automatically
✅ Find optimal configuration faster
✅ Reproducible (search process documented)

AutoML Strategies

Strategy 1: Hyperparameter Optimization (Optuna)

from specweave import OptunaOptimizer

Define search space

def objective(trial): # Suggest hyperparameters params = { 'n_estimators': trial.suggest_int('n_estimators', 100, 1000), 'max_depth': trial.suggest_int('max_depth', 3, 10), 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), 'subsample': trial.suggest_float('subsample', 0.5, 1.0), 'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0) }

# Train model
model = XGBClassifier(**params)

# Cross-validation score
scores = cross_val_score(model, X_train, y_train, cv=5, scoring='roc_auc')

return scores.mean()

Run optimization

optimizer = OptunaOptimizer( objective=objective, n_trials=100, direction='maximize', increment="0042" )

best_params = optimizer.optimize()

Creates:

- .specweave/increments/0042.../experiments/optuna-study/

├── study.db (Optuna database)

├── optimization_history.png

├── param_importances.png

├── parallel_coordinate.png

└── best_params.json

Optimization Report:

Optuna Optimization Report

Search Space

n_estimators: [100, 1000]
max_depth: [3, 10]
learning_rate: [0.01, 0.3] (log scale)
subsample: [0.5, 1.0]
colsample_bytree: [0.5, 1.0]

Trials: 100

Completed: 98
Pruned: 2 (early stopping)
Failed: 0

Best Trial (#47)

ROC AUC: 0.892 ± 0.012
Parameters:
- n_estimators: 673
- max_depth: 6
- learning_rate: 0.094
- subsample: 0.78
- colsample_bytree: 0.91

Parameter Importance

learning_rate (0.42) - Most important
n_estimators (0.28)
max_depth (0.18)
colsample_bytree (0.08)
subsample (0.04) - Least important

Improvement over Default

Default params: ROC AUC = 0.856
Optimized params: ROC AUC = 0.892
Improvement: +4.2%

Strategy 2: Algorithm Selection + Tuning

from specweave import AutoMLPipeline

Define candidate algorithms with search spaces

pipeline = AutoMLPipeline(increment="0042")

Add candidates

pipeline.add_candidate( name="xgboost", model=XGBClassifier, search_space={ 'n_estimators': (100, 1000), 'max_depth': (3, 10), 'learning_rate': (0.01, 0.3) } )

pipeline.add_candidate( name="lightgbm", model=LGBMClassifier, search_space={ 'n_estimators': (100, 1000), 'max_depth': (3, 10), 'learning_rate': (0.01, 0.3) } )

pipeline.add_candidate( name="random_forest", model=RandomForestClassifier, search_space={ 'n_estimators': (100, 500), 'max_depth': (3, 20), 'min_samples_split': (2, 20) } )

pipeline.add_candidate( name="logistic_regression", model=LogisticRegression, search_space={ 'C': (0.001, 100), 'penalty': ['l1', 'l2'] } )

Run AutoML (tries all algorithms + hyperparameters)

results = pipeline.fit( X_train, y_train, n_trials_per_model=50, cv_folds=5, metric='roc_auc' )

Best model automatically selected

best_model = pipeline.best_model_ best_params = pipeline.best_params_

AutoML Comparison:

Model	Trials	Best Score	Mean Score	Std	Best Params
xgboost	50	0.892	0.876	0.012	n_est=673, depth=6, lr=0.094
lightgbm	50	0.889	0.873	0.011	n_est=542, depth=7, lr=0.082
random_forest	50	0.871	0.858	0.015	n_est=384, depth=12, min_split=5
logistic_regression	50	0.845	0.840	0.008	C=1.234, penalty=l2

Winner: XGBoost (ROC AUC = 0.892)

Strategy 3: Neural Architecture Search (NAS)

from specweave import NeuralArchitectureSearch

For deep learning

nas = NeuralArchitectureSearch(increment="0042")

Define search space

search_space = { 'num_layers': (2, 5), 'layer_sizes': (32, 512), 'activation': ['relu', 'tanh', 'elu'], 'dropout': (0.0, 0.5), 'optimizer': ['adam', 'sgd', 'rmsprop'], 'learning_rate': (0.0001, 0.01) }

Search for best architecture

best_architecture = nas.search( X_train, y_train, search_space=search_space, n_trials=100, max_epochs=50 )

Creates: Best neural network architecture

AutoML Frameworks Integration

Optuna (Recommended)

import optuna from specweave import configure_optuna

Auto-configures Optuna to log to increment

configure_optuna(increment="0042")

def objective(trial): params = { 'n_estimators': trial.suggest_int('n_estimators', 100, 1000), 'max_depth': trial.suggest_int('max_depth', 3, 10), }

model = XGBClassifier(**params)
score = cross_val_score(model, X, y, cv=5).mean()
return score

study = optuna.create_study(direction='maximize') study.optimize(objective, n_trials=100)

Automatically logged to increment folder

Auto-sklearn

from specweave import AutoSklearnOptimizer

Automated model selection + feature engineering

optimizer = AutoSklearnOptimizer( time_left_for_this_task=3600, # 1 hour increment="0042" )

optimizer.fit(X_train, y_train)

Auto-sklearn tries:

- Multiple algorithms

- Feature preprocessing combinations

- Ensemble methods

Returns best pipeline

H2O AutoML

from specweave import H2OAutoMLOptimizer

optimizer = H2OAutoMLOptimizer( max_runtime_secs=3600, # 1 hour max_models=50, increment="0042" )

optimizer.fit(X_train, y_train)

H2O tries many algorithms in parallel

Returns leaderboard + best model

Best Practices

Start with Default Baseline

Always compare AutoML to default hyperparameters

baseline_model = XGBClassifier() # Default params baseline_score = cross_val_score(baseline_model, X, y, cv=5).mean()

Then optimize

optimizer = OptunaOptimizer(objective, n_trials=100) optimized_params = optimizer.optimize()

improvement = (optimized_score - baseline_score) / baseline_score * 100 print(f"Improvement: {improvement:.1f}%")

Only use optimized if significant improvement (>2-3%)

Use Cross-Validation

❌ Wrong: Single train/test split

score = model.score(X_test, y_test)

✅ Correct: Cross-validation

scores = cross_val_score(model, X_train, y_train, cv=5) score = scores.mean()

Prevents overfitting to specific train/test split

Set Reasonable Search Budgets

Quick exploration (development)

optimizer.optimize(n_trials=20) # ~5-10 minutes

Moderate search (iteration)

optimizer.optimize(n_trials=100) # ~30-60 minutes

Thorough search (final model)

optimizer.optimize(n_trials=500) # ~2-4 hours

Don't overdo it: diminishing returns after ~100-200 trials

Prune Unpromising Trials

Optuna can stop bad trials early

study = optuna.create_study( direction='maximize', pruner=optuna.pruners.MedianPruner() )

If trial is performing worse than median at epoch N, stop it

Saves time by not fully training bad models

Document Search Space Rationale

Document why you chose specific ranges

search_space = { # XGBoost recommends max_depth 3-10 for most tasks 'max_depth': (3, 10),

# Learning rate: 0.01-0.3 covers slow to fast learning
# Log scale to spend more trials on smaller values
'learning_rate': (0.01, 0.3, 'log'),

# n_estimators: Balance accuracy vs training time
'n_estimators': (100, 1000)

}

Integration with SpecWeave

Automatic Experiment Tracking

All AutoML trials logged automatically

optimizer = OptunaOptimizer(objective, increment="0042") optimizer.optimize(n_trials=100)

Creates:

.specweave/increments/0042.../experiments/

├── optuna-trial-001/

├── optuna-trial-002/

├── ...

├── optuna-trial-100/

└── optuna-summary.md

Living Docs Integration

/sw:sync-docs update

Updates:

Hyperparameter Optimization (Increment 0042)

Optimization Strategy

Framework: Optuna (Bayesian optimization)
Trials: 100
Search space: 5 hyperparameters
Metric: ROC AUC (5-fold CV)

Results

Best score: 0.892 ± 0.012
Improvement over default: +4.2%
Most important param: learning_rate (0.42)

Selected Hyperparameters

{
    'n_estimators': 673,
    'max_depth': 6,
    'learning_rate': 0.094,
    'subsample': 0.78,
    'colsample_bytree': 0.91
}

Recommendation

XGBoost with optimized hyperparameters for production deployment.

## Commands

```bash
# Run AutoML optimization
/ml:optimize 0042 --trials 100

# Compare algorithms
/ml:compare-algorithms 0042

# Show optimization history
/ml:optimization-report 0042

Common Patterns

Pattern 1: Coarse-to-Fine Optimization

# Step 1: Coarse search (wide ranges, few trials)
coarse_space = {
    'n_estimators': (100, 1000, 'int'),
    'max_depth': (3, 10, 'int'),
    'learning_rate': (0.01, 0.3, 'log')
}
coarse_results = optimizer.optimize(coarse_space, n_trials=50)

# Step 2: Fine search (narrow ranges around best)
best_params = coarse_results['best_params']
fine_space = {
    'n_estimators': (best_params['n_estimators'] - 100, 
                     best_params['n_estimators'] + 100),
    'max_depth': (max(3, best_params['max_depth'] - 1),
                  min(10, best_params['max_depth'] + 1)),
    'learning_rate': (best_params['learning_rate'] * 0.5,
                      best_params['learning_rate'] * 1.5, 'log')
}
fine_results = optimizer.optimize(fine_space, n_trials=50)

Pattern 2: Multi-Objective Optimization

# Optimize for multiple objectives (accuracy + speed)
def multi_objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
        'max_depth': trial.suggest_int('max_depth', 3, 10),
    }
    
    model = XGBClassifier(**params)
    
    # Objective 1: Accuracy
    accuracy = cross_val_score(model, X, y, cv=5).mean()
    
    # Objective 2: Training time
    start = time.time()
    model.fit(X_train, y_train)
    training_time = time.time() - start
    
    return accuracy, -training_time  # Maximize accuracy, minimize time

# Optuna will find Pareto-optimal solutions
study = optuna.create_study(directions=['maximize', 'minimize'])
study.optimize(multi_objective, n_trials=100)

Summary

AutoML accelerates ML development by:

- ✅ Automating tedious hyperparameter tuning

- ✅ Exploring search space systematically

- ✅ Finding optimal configurations faster

- ✅ Tracking all experiments automatically

- ✅ Documenting optimization process

Don't spend days manually tuning—let AutoML do it in hours.

automl-optimizer

Safety Notice

Copy this and send it to your AI assistant to learn

Define search space

Run optimization

Creates:

- .specweave/increments/0042.../experiments/optuna-study/

├── study.db (Optuna database)

├── optimization_history.png

├── param_importances.png

├── parallel_coordinate.png

└── best_params.json

Optuna Optimization Report

Search Space

Trials: 100

Best Trial (#47)

Parameter Importance

Improvement over Default

Define candidate algorithms with search spaces

Add candidates

Run AutoML (tries all algorithms + hyperparameters)

Best model automatically selected

For deep learning

Define search space

Search for best architecture

Creates: Best neural network architecture

Auto-configures Optuna to log to increment

Automatically logged to increment folder

Automated model selection + feature engineering

Auto-sklearn tries:

- Multiple algorithms

- Feature preprocessing combinations

- Ensemble methods

Returns best pipeline

H2O tries many algorithms in parallel

Returns leaderboard + best model

Always compare AutoML to default hyperparameters

Then optimize

Only use optimized if significant improvement (>2-3%)

❌ Wrong: Single train/test split

✅ Correct: Cross-validation

Prevents overfitting to specific train/test split

Quick exploration (development)

Moderate search (iteration)

Thorough search (final model)

Don't overdo it: diminishing returns after ~100-200 trials

Optuna can stop bad trials early

If trial is performing worse than median at epoch N, stop it

Saves time by not fully training bad models

Document why you chose specific ranges

All AutoML trials logged automatically

Creates:

.specweave/increments/0042.../experiments/

├── optuna-trial-001/

├── optuna-trial-002/

├── ...

├── optuna-trial-100/

└── optuna-summary.md

Hyperparameter Optimization (Increment 0042)

Optimization Strategy

Results

Selected Hyperparameters

Source Transparency

Related Skills

technical-writing

spec-driven-brainstorming

kafka-architecture