Skip to main content
AI Safety, Alignment & Ethics

⏱ About 15 min15 XP

How Bias Gets Into AI

Bias does not sneak into AI systems through some mysterious back door. It enters through ordinary, well-understood pathways — choices about what data to collect, how to label it, what problem to define, and what to optimize for. Understanding these pathways is the first step toward stopping them. In this lesson, you will trace the two main routes bias takes into AI: training data and design choices.

Entry Point 1 — Training Data

AI systems that use machine learning do not follow rules written by a programmer. They learn patterns from examples — and those examples are the training data. Whatever patterns exist in the training data get absorbed into the model. If the training data reflects an unequal world, the model learns that inequality. A face recognition system trained mostly on photographs of light-skinned adults will perform best on light-skinned adults — not because its creators intended this, but because the data was skewed. The system faithfully learned what it was shown.

Training Data Is the Teacher

An AI model is only as fair as the data it learns from. Biased data produces a biased model — automatically, without anyone writing a prejudiced rule.

There are several specific ways training data can introduce bias. Underrepresentation happens when some groups appear far less often in the data than others. A medical AI trained mostly on data from adult men may perform poorly on women or children — it simply has not seen enough examples from those groups. Historical bias happens when past decisions were themselves biased. If a company's old hiring records show a pattern of rejecting qualified women for technical roles, an AI trained on those records learns that women are less likely to succeed — and reproduces the old discrimination. Measurement bias happens when the data measures different things for different groups. Crime statistics, for example, reflect how policing was done, not the true rate of criminal behavior. If police patrol some neighborhoods more intensively, more arrests happen there — and an AI trained on those records will predict higher crime risk for people from those neighborhoods, even if crime rates are actually similar.

Historical Bias Is Especially Sneaky

Data from the past looks like objective fact, but it records human decisions — and human decisions have historically been unfair to many groups. Training on past decisions can lock old injustices into new AI systems.

Entry Point 2 — Design Choices

Bias can also enter through the decisions engineers and organizations make before a single line of code is written — or after. Problem framing: How you define the problem shapes everything. If a company frames the goal as 'predict which employees will leave within a year,' an AI might learn to flag employees from certain demographic groups who historically had more turnover — without examining whether those groups left because of workplace discrimination they faced, rather than personal unreliability. Choice of features: Features are the variables the AI uses to make its prediction. If you include zip code as a feature in a credit-scoring model, and zip code correlates with race because of historical housing segregation, the AI effectively uses race as a factor — even if race itself is not in the dataset. Optimization target: What you tell the AI to maximize matters. A content recommendation system told to maximize clicks and watch time will learn that outrage and fear keep users engaged longer — not because anyone wanted to spread outrage, but because that is what the optimization rewarded.

Proxy Variables

A proxy variable is one that correlates strongly with a protected characteristic like race or gender. Including zip code, school name, or certain names can give an AI an indirect way to use protected characteristics even when they are not explicitly in the data.

Match each bias source to its description.

Terms

Underrepresentation
Historical bias
Measurement bias
Proxy variable
Optimization target

Definitions

The data records past human decisions that were themselves unfair
The data measures things differently across groups, distorting the picture
What the AI is told to maximize, which can reward harmful patterns
Some groups appear far less often in training data than their real-world share
A feature that indirectly encodes a protected characteristic like race

Drag terms onto their definitions, or click a term then click a definition to match.

The Two Entry Points Together

In practice, training data bias and design bias often work together. A team might collect data without thinking carefully about who is represented, choose features that act as proxies for protected characteristics, and frame the optimization goal in a way that rewards the wrong outcomes. Each individual choice seems reasonable in isolation, but the combined effect is a system that treats people unfairly. This is why bias prevention cannot be an afterthought. It requires deliberate attention at every stage of building an AI system — not just a quick check at the end.

An AI is trained on a company's ten years of past hiring decisions. Those records show that women were hired at lower rates for engineering roles. What type of bias is most likely to affect this AI?

A credit-scoring AI does not include race as a feature, but it does include the applicant's zip code. Why might this still introduce racial bias?

Trace the Bias

  1. Step 1: Read this scenario: A hospital wants to build an AI to identify patients at high risk of hospital readmission within 30 days. They train it on five years of patient records, using features that include age, diagnosis codes, number of previous hospitalizations, and total healthcare spending.
  2. Step 2: Identify at least two ways bias might enter this system. Think about who is likely underrepresented in hospital records, what the features might correlate with, and what optimizing for 'readmission prediction' might accidentally reward or ignore.
  3. Step 3: For each bias source you identified, write one specific change the team could make to reduce that bias.
  4. Step 4: Share your analysis — would this AI be safe to deploy immediately? What additional checks would you want before it goes live?