Domain 2 β€” Module 2 of 8 25%
7 of 25 overall
Domain 2: Implement Machine Learning Model Lifecycle and Operations Free ⏱ ~13 min read

AutoML & Hyperparameter Tuning

Don't guess hyperparameters β€” sweep them. Learn AutoML for automated model selection and hyperparameter tuning with sweep jobs to find the optimal configuration.

Finding the best model automatically

Simple explanation

Imagine you’re buying a car but there are 500 models.

You could test-drive every single one β€” that would take years. Or you could tell a smart assistant: β€œI need a sedan, under $40K, good fuel economy” and let them narrow it down to 5 finalists for you to try.

AutoML does this for machine learning. Instead of manually trying Random Forest, then XGBoost, then Neural Net… AutoML tries dozens of algorithms and configurations automatically, then tells you which one performed best.

Hyperparameter tuning is the fine-tuning step: once you’ve chosen your car model, you adjust the seat, mirrors, and steering to get the perfect fit.

AutoML: automated model selection

AutoML in Azure ML automatically:

  1. Tries multiple algorithms (Random Forest, XGBoost, LightGBM, Neural Nets…)
  2. Applies feature engineering (encoding, scaling, imputation)
  3. Selects the best model based on your chosen metric
  4. Logs everything to MLflow
from azure.ai.ml import automl

# Define an AutoML classification job
classification_job = automl.classification(
    training_data=Input(type="mltable", path="azureml:churn-data:2"),
    target_column_name="churned",
    primary_metric="AUC_weighted",
    compute="gpu-training-cluster",
    experiment_name="churn-automl-baseline",
)

# Configure limits
classification_job.set_limits(
    max_trials=50,           # Try up to 50 model configurations
    max_concurrent_trials=4,  # Run 4 trials in parallel
    timeout_minutes=120,      # Stop after 2 hours
    enable_early_termination=True  # Stop bad trials early
)

# Submit the job
returned_job = ml_client.jobs.create_or_update(classification_job)

What’s happening:

  • Lines 4-9: Defines a classification task β€” AutoML needs to know the data, target column, and which metric to optimise
  • Line 7: AUC_weighted is the metric AutoML maximises β€” it tries different algorithms to get the highest score
  • Lines 13-17: Limits prevent runaway costs β€” max 50 trials, 4 at a time, 2-hour cap
  • Line 17: Early termination stops trials that are clearly performing poorly

AutoML task types

TaskUse CaseExample Metric
ClassificationPredict a categoryAUC_weighted, accuracy, F1
RegressionPredict a numberRMSE, R2, MAE
Time-series forecastingPredict future valuesMAPE, RMSE
Image classificationClassify imagesAccuracy
Object detectionFind objects in imagesmAP
NLP text classificationClassify text documentsAccuracy, F1
Scenario: Kai establishes a baseline fast

Kai has a new customer churn dataset and needs a baseline model by Friday. Instead of spending days trying different algorithms:

  1. Runs AutoML with 50 trials and a 2-hour timeout
  2. AutoML tries 12 algorithms with various feature engineering
  3. Best model: LightGBM with AUC of 0.943
  4. Kai logs the winner and uses it as the benchmark

Now the data science team knows: β€œBeat 0.943 AUC or we ship the AutoML model.”

Priya (CTO): β€œWe have a production-ready baseline in 2 hours? I love this.”

Sweep jobs: hyperparameter tuning

Once you’ve chosen an algorithm, sweep jobs search for the best hyperparameters:

from azure.ai.ml.sweep import Choice, Uniform, BanditPolicy
from azure.ai.ml import command

# Define the training command
train_command = command(
    code="./src",
    command="python train.py "
            "--learning-rate ${{search_space.learning_rate}} "
            "--n-estimators ${{search_space.n_estimators}} "
            "--max-depth ${{search_space.max_depth}}",
    environment="azureml:churn-training:3",
    compute="gpu-training-cluster",
)

# Define the search space
sweep_job = train_command.sweep(
    sampling_algorithm="bayesian",
    primary_metric="f1_score",
    goal="maximize",
)

sweep_job.search_space = {
    "learning_rate": Uniform(min_value=0.001, max_value=0.1),
    "n_estimators": Choice(values=[50, 100, 200, 500]),
    "max_depth": Choice(values=[5, 8, 10, 15, 20]),
}

# Early termination β€” stop bad runs
sweep_job.early_termination = BanditPolicy(
    slack_factor=0.1,
    evaluation_interval=2,
)

sweep_job.set_limits(max_total_trials=200, max_concurrent_trials=8)

# Submit
returned_job = ml_client.jobs.create_or_update(sweep_job)

What’s happening:

  • Lines 6-12: The training script accepts hyperparameters as command-line arguments
  • Line 17: Bayesian sampling learns from previous trials to choose smarter next trials
  • Lines 23-26: The search space defines ranges β€” MLflow logs each combination tried
  • Lines 29-31: Bandit policy cancels runs that fall behind the best run by more than 10%

Sampling algorithms

Hyperparameter sampling strategies
FeatureIntelligenceSpeedBest For
GridNone β€” tries every combinationSlow (exhaustive)Small search spaces, need all results
RandomNone β€” picks randomlyFast start, good coverageLarge spaces, initial exploration
BayesianLearns from previous trialsSlower per trial, fewer neededWhen trials are expensive, want optimal result

Early termination policies

PolicyHow It WorksWhen to Use
BanditStops runs that lag behind the best by a slack factorMost common β€” good balance of exploration and cost
Median stoppingStops runs below the median of all runs at same pointWhen you want to keep more diverse trials
Truncation selectionCancels bottom X% of runs at each intervalAggressive pruning for large sweeps
Exam tip: Bayesian vs random sampling

The exam often tests when to use each sampling algorithm:

  • Random: best when the search space is large and you want broad coverage quickly. Also useful when you can afford many trials.
  • Bayesian: best when each trial is expensive (GPU hours) and you want to converge on the optimum with fewer trials. NOT available with early termination policies that need all runs to complete.
  • Grid: only practical for very small search spaces (under 20 combinations).

If the question mentions β€œlimited compute budget” and β€œfind the optimal configuration,” the answer is usually Bayesian.

Key terms flashcards

Question

AutoML vs sweep jobs β€” what's the difference?

Click or press Enter to reveal answer

Answer

AutoML: tries multiple algorithms and feature engineering automatically (broad search). Sweep jobs: searches hyperparameters for ONE chosen algorithm (deep search). Use AutoML for baseline, sweeps for optimization.

Click to flip back

Question

What are the three sampling algorithms for sweep jobs?

Click or press Enter to reveal answer

Answer

Grid (exhaustive, every combination), Random (fast, broad coverage), Bayesian (learns from previous trials, fewer trials needed). Bayesian is best when trials are expensive.

Click to flip back

Question

What does the Bandit early termination policy do?

Click or press Enter to reveal answer

Answer

Cancels runs that fall behind the best-performing run by more than a specified slack factor. Saves compute by stopping clearly underperforming trials.

Click to flip back

Knowledge check

Knowledge Check

Kai has a new dataset and needs a baseline model by Friday. He doesn't know which algorithm will work best. What should he use?

Knowledge Check

Dr. Luca is running a hyperparameter sweep for a genomics model. Each trial uses an A100 GPU and takes 45 minutes. He has budget for about 30 trials. Which sampling algorithm should he choose?


Next up: Training Pipelines β€” automating the entire training workflow end to end.