Good Examples, Bad Examples
Imagine trying to learn what a pizza looks like if someone only shows you blurry photos taken in the dark. You would have a hard time! The examples matter just as much as how many you have. A machine learning program is the same way. Good examples help it learn well. Bad examples teach it the wrong things — or nothing at all.
What Makes an Example Good?
A good example has three things going for it: Clear: The information is easy to read. A sharp, well-lit photo is clearer than a blurry, dark one. A neatly written word is clearer than a scribble. Correct: The label matches the example. If a photo shows a cat but the label says 'dog,' the machine learns something false. Varied: Good examples show the thing in many different ways. If you only show the machine golden retrievers, it might not recognize a tiny chihuahua as a dog. Variety helps the machine understand the full picture — literally!
Good examples are clear, correct, and varied. A machine trained on good examples learns to recognize things in all their different shapes and sizes. A machine trained on bad examples learns badly — or not at all.
Here is a story to show what happens with bad examples: A team wanted to teach a machine to spot unhealthy plants in photos. They collected hundreds of pictures. But most of the pictures were taken on sunny days. When the machine was tested on photos taken on cloudy days, it got confused — the colors looked different, and it did not recognize the sick plants anymore. The examples were not varied enough. The machine had only seen sunny-day plants, so it did not learn the full idea of 'unhealthy plant.' The team had to collect more examples from cloudy days and indoors too.
Match each example problem to what it causes in the machine's learning.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Making sure examples are good is one of the most important jobs when building a machine learning program. People called data scientists spend a lot of time cleaning data — fixing mistakes, removing blurry examples, and adding variety. Cleaning data is not glamorous, but it is absolutely necessary. Even a brilliant learning program cannot learn well from messy examples.
There is a saying among people who build learning machines: garbage in, garbage out. If you feed bad examples in, you get bad results out. The machine is only as good as the examples it learns from.
Which of these is an example of a bad training example for a machine learning a cat?
What does 'varied' mean when talking about training examples?
Sort the Examples
- Draw two columns on paper: 'Good Example' and 'Bad Example'.
- Now read each description below and write it in the correct column:
- - A sharp, well-lit photo of a red apple labeled 'apple'
- - A blurry photo where you cannot tell what the fruit is
- - A photo of a green apple labeled 'orange' by mistake
- - Photos of apples in many sizes — tiny crabapples, huge Granny Smiths, red ones, yellow ones
- - Twenty photos that all show the exact same apple from the exact same angle
- Talk about your answers with a friend or family member.
- Why does variety matter? Why does accuracy of the label matter?