Labeling Examples
Now you know what a label is. But how do people actually attach labels to examples? It turns out this is a job that takes patience — and a lot of careful looking. In this lesson, we will follow a team of kids labeling examples to teach a machine to sort fruit.
Building a Labeled Collection
Imagine a team of four friends: Ada, Benji, Carmen, and Diego. They want to teach a computer to tell apples from oranges. First, they collect a hundred photos — some apples, some oranges, all mixed together. Then they sit down and go through every single photo. Ada picks up a photo of a shiny red apple. She types 'apple' next to it. That is one labeled example. Benji picks up a photo of a round orange fruit. He types 'orange.' That is another. One by one, they label every photo. When they are done, they have one hundred labeled examples.
Labeling examples means going through a collection and giving each example its correct name. A big collection of labeled examples is what a machine uses to learn.
Labeling sounds easy, but it takes real care. What if a photo shows an apple that has turned orange because it is very ripe? Is it still an apple? Yes! The label should say 'apple,' because that is what the fruit actually is. The friends must agree on the rules before they start. If Ada labels ripe red apples as 'orange' by mistake and Benji labels them as 'apple,' the machine will get confused. Consistency — using the same rule every time — is the most important part of labeling.
Match each labeling mistake to what goes wrong.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
After the friends finish labeling, they have a dataset. A dataset is a collection of labeled examples all ready for a machine to study. The bigger and more careful the dataset, the better the machine will learn. If the dataset only had five photos, the machine would not see enough examples to really understand the difference. But with a hundred careful labels, it has a solid start.
If a labeler makes a mistake — calling a dog a cat, for example — the machine will learn that mistake. Always double-check your labels before you hand them to a machine.
What is a dataset?
Why is it important for all labelers to use the same rules?
Label a Mini-Dataset
- Draw or cut out eight pictures — four of one animal and four of another (for example, cats and dogs).
- Mix all eight pictures together in a pile.
- Go through the pile one by one and write the correct label on the back of each picture.
- When you are done, flip them all face-down and check your labels.
- Count how many of each label you have. Do the numbers seem right?
- This is exactly what real people do when building datasets for machines!