Overfitting: Memorizing vs. Learning
Picture a student cramming for a math test by memorizing every single practice problem and its answer. When the test has the exact same questions, they ace it. But when the teacher changes the numbers slightly, the student is lost — they memorized answers, not methods. A machine learning model can fall into exactly this trap. It is called overfitting, and it is one of the most important failure modes in all of machine learning.
What Overfitting Looks Like
An overfit model has learned the training data so thoroughly — including its noise, quirks, and random accidents — that it has lost the ability to generalize to new examples. Here is a concrete example. A model is trained to predict house prices from square footage. The training data has 50 houses. A simple model might learn a smooth upward line: more square footage, higher price. That line generalizes well — a new house of 1,800 sq ft gets a reasonable prediction. An overfit model instead draws a wildly curvy line that passes through every single training point exactly. It achieves near-zero training loss. But when a new house at 1,800 sq ft is given, the curve might output $4 million or $2,000 — both obviously wrong. The model memorized the noise in the 50 training houses instead of the real relationship. In terms of loss curves: an overfit model shows training loss falling smoothly toward zero while validation loss (loss on examples the model has not seen) falls at first, then starts rising again. That divergence is the signature of overfitting.
Overfitting occurs when a model performs well on training data but poorly on new, unseen data. It has memorized the training set — including its noise — rather than learned the underlying pattern. The gap between training loss and validation loss is the diagnostic signal.
Why does overfitting happen? Three main causes: 1. Model too complex for the data. A model with millions of parameters trained on only a few hundred examples has more than enough capacity to memorize every example. It does not need to generalize. 2. Trained for too many epochs. Each extra epoch gives the model another chance to fit the noise in the training data. Past a certain point, more epochs only make things worse on new data. 3. Not enough training data. With only a handful of examples, any model powerful enough to be useful can also memorize the whole set. Solutions exist for each: simplify the model, stop training earlier (early stopping), or gather more data. A technique called regularization also adds a penalty to the loss function that discourages the model from making its weights too extreme — a common and powerful fix.
Underfitting: The Opposite Problem
Overfitting has a mirror image: underfitting. An underfit model is too simple to capture the real pattern — it fails on both training data and new data. The student analogy: someone who did not study at all and guesses randomly. A good model lives between the two extremes: complex enough to learn the real signal, but not so complex that it memorizes noise. Finding that balance is one of the core challenges of machine learning, and you will practice it in Lessons 8 and 9.
A model with 0.001 training loss sounds incredible — but if validation loss is 2.5, the model is severely overfit and will fail in the real world. Always check both training and validation loss together.
Flashcards — click each card to reveal the answer
A model achieves 99% accuracy on its training set but only 61% on a held-out validation set. What is most likely happening?
Which technique halts training when validation loss starts to rise, preventing further overfitting?
The Memorization Test
- Step 1: Write ten simple addition problems and their answers on a piece of paper (e.g., 3+7=10, 14+6=20).
- Step 2: Give a friend (or yourself) one minute to memorize all 10 answers — no reasoning allowed, just memorize.
- Step 3: Now give them 10 new problems of the same type that were not on the list. See how many they get right.
- Step 4: Give a second person the same test but tell them to learn the rule (addition) instead of memorizing answers. See how they do on the new problems.
- Step 5: Write two sentences connecting what you observed to the difference between overfitting and true generalization in machine learning.