Checking the Machine's Work
Imagine you studied really hard for a spelling test. You practiced every word on the list. But on test day, your teacher gives you words that were NOT on the practice list — and now you have to spell them correctly from what you learned. That test is the real proof that you truly learned to spell, not just memorize. Sorting machines face the same kind of test!
The Test Set
When machine-learning engineers build a sorting machine, they do something clever. Before teaching the machine, they take a portion of their labeled examples and set them aside in a secret envelope. The machine never sees these during training. After the machine has finished learning from the rest of the examples, the engineers open the secret envelope. They show the machine those hidden examples one by one and ask: 'Which group does this belong to?' The machine makes its prediction. The engineers compare that prediction to the real label in the envelope. Right or wrong? This hidden collection of examples is called the test set.
A test set is a collection of labeled examples the machine has never seen. We use it to check how well the machine truly learned — not just how well it remembers.
Here is why the test set must be kept secret from the machine during training. If the machine saw those examples while learning, it might just memorize them. It would score perfectly on the test but fail on brand-new examples in the real world. Keeping the test set separate is like keeping the test answers hidden until after the test. It gives an honest picture of how well the machine has really learned.
Match each term to its meaning.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
After testing, the engineers calculate a score. If the machine correctly sorted 90 out of 100 test examples, its score is 90 percent. That means it got 9 out of every 10 right. Is 90 percent good? It depends on the job. For sorting cat photos it might be fine. For deciding if a medical test result is dangerous, you would want a much higher score!
A 90 percent score sounds great, but if your machine is sorting dangerous things — like whether food has gone bad — even a 10 percent mistake rate could cause problems. Always think about what the mistakes mean in real life.
What is a test set?
Why must the machine not see the test set during training?
The Secret Envelope Test
- With a partner, draw or write 10 examples on cards — 5 of one type and 5 of another (for example, 5 vehicles and 5 animals).
- Set 3 cards aside face-down as your secret test set — do not look at them yet.
- Study the other 7 cards with their labels for one minute.
- Now flip over the 3 secret cards one at a time. Try to label each one correctly from what you just learned.
- Check: how many did you get right out of 3?
- That is your test score. Talk about which ones were tricky and why.