Skip to main content
Machine Learning & Deep Learning

⏱ About 10 min10 XP

Fair Examples for Everyone

You have learned that examples teach machines. But here is an important question: what if the examples leave some people out? A machine only knows what it has been shown. If certain people are missing from the examples, the machine does not know about them — and it might make unfair mistakes.

When Examples Leave People Out

Imagine a machine is trained to recognize human faces. The team collects ten thousand face photos. But they only collect photos from one city, and almost everyone in that city looks similar. Now the machine goes out into the world, where people have many different skin tones, hair types, eye shapes, and ages. The machine struggles — it makes lots of mistakes on faces it has never seen before. This is called bias in the data. Bias means the examples were not balanced — some kinds of people, places, or situations were included a lot, while others were left out almost entirely. The machine did not mean to be unfair. It just learned from examples that were not fair to begin with.

The Big Idea

If training examples leave out certain people or groups, a machine can make unfair mistakes — not because it is mean, but because it was never shown those examples. Fair data means including everyone.

Here is another way to think about it: Imagine a voting machine that was only taught to read votes written in blue pen. Millions of people vote in black pen, red pen, or pencil. The machine cannot read those votes — not because those voters did wrong, but because the machine was never taught about them. That is unfair, and it happens with real machine learning systems too. Machines trained on biased data have made mistakes in medical checkups, job applications, and even voice recognition — failing more often for people who were not well represented in the training data.

Flashcards — click each card to reveal the answer

The good news is that people are working hard on this. Data scientists, researchers, and communities around the world are thinking carefully about fairness in data. They check: who is included in our examples? Who might be left out? How can we do better? Fairness in data is not a solved problem yet — but it is one of the most important things the AI field is working on.

Include Everyone

When collecting examples, always ask: who might we be leaving out? Children, older adults, people with disabilities, people who speak different languages — everyone who might use a machine should be represented in its training data.

What does 'bias in data' mean?

Why does it matter if a face-recognition machine was only trained on faces from one group of people?

Who Is in the Picture?

  1. Look through a magazine, book, or website with a grown-up's help.
  2. Count how many different kinds of people you see in the pictures over five pages.
  3. Are there people of different ages? Different skin tones? Different abilities? Different backgrounds?
  4. Tally up what you find on a piece of paper.
  5. Talk about it: If a machine was trained only on pictures from these pages, who would it know a lot about? Who might it know less about?
  6. What would you add to make the examples fairer?