Tuning a Model
By now you know how a model learns: it adjusts its weights, guided by gradient descent, to reduce loss across many epochs. But there is a second layer of settings — settings that control how learning happens, not what the model learns. These are called hyperparameters. Getting them right is the difference between a model that barely works and one that excels. This lesson is about what hyperparameters are, which ones matter most, and how practitioners search for good values.
Weights vs. Hyperparameters
A weight is a number the model adjusts itself during training. Given enough data and iterations, gradient descent finds good weight values automatically. A hyperparameter is a setting chosen by the practitioner before training begins. The model does not learn it — you set it. Training then proceeds under those rules. The most important hyperparameters in most models: Learning rate: how large each gradient descent step is. Too large and training oscillates; too small and training crawls. Typical values range from 0.1 down to 0.000001. Number of epochs: how many full passes over the training data. More epochs allow more learning but increase overfitting risk. Batch size: how many examples are processed per weight update. Larger batches give smoother gradients; smaller batches add useful noise. Model architecture choices: for a neural network — how many layers, how many neurons per layer. More layers and neurons mean more capacity — useful for complex problems but dangerous with little data.
Weights: learned automatically from data by gradient descent during training. Hyperparameters: set by the practitioner before training begins. They control how training happens, not what is learned. Getting them right requires experimentation.
Consider the learning rate in detail, because it is the most sensitive hyperparameter in nearly every model. Too high (e.g., 1.0): the model takes large steps, overshoots the loss minimum, and loss bounces wildly or even grows over time. Too low (e.g., 0.000001): the model takes tiny steps. Training technically works but requires far more epochs than practical, and it may get stuck in a plateau. Just right (e.g., 0.001 for many neural networks): loss decreases smoothly and steadily, reaching a good minimum within a reasonable number of epochs. Practitioners often start at 0.001 as a default and adjust based on the loss curve — exactly the reading skill you developed in Lesson 4.
Searching for Good Hyperparameters
How do you find good hyperparameter values? Three main strategies: Manual search: try a value, check validation loss, adjust, repeat. Works fine when you have a sense of the problem and only a few hyperparameters. Grid search: systematically try every combination from a predefined grid. For example, try learning rates [0.1, 0.01, 0.001] crossed with batch sizes [32, 64, 128] — 9 combinations total. Thorough but expensive as the grid grows. Random search: sample hyperparameter combinations at random from a range. Counterintuitively, random search often finds better results than grid search in the same number of trials — because not all hyperparameters matter equally, and random search covers more of the important ones. All three methods evaluate each candidate combination using validation loss — never test-set loss. The winning combination is then used to train the final model, which is then evaluated once on the test set.
Each candidate hyperparameter setting requires a full training run to evaluate. With slow models and large datasets, a grid search over 27 combinations can take days. Budget time for this step — it is not optional if you want a competitive model.
Match each hyperparameter to the main thing it controls.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
A practitioner sets the learning rate to 0.5 and observes the training loss spiking up and down wildly every epoch. What should they try next?
Why should hyperparameter search decisions be based on validation loss rather than test-set loss?
Manual Hyperparameter Search
- Step 1: Imagine you are training a model to predict whether a social media post will go viral. You have three hyperparameters to tune: learning rate (try 0.01, 0.001, 0.0001), batch size (try 32 or 64), and epochs (try 10 or 20).
- Step 2: List all possible combinations (this is a grid search). How many combinations are there in total?
- Step 3: Rank the three hyperparameters from 'most likely to have a big effect' to 'least likely' based on what you learned in this lesson. Explain your ranking.
- Step 4: If you could only try three combinations (not all of them), which three would you pick first and why?
- Step 5: Describe how you would use the validation set to decide which combination won.