Ways AI Can Go Wrong
No engineering system is perfect, and AI is no exception. But AI failures have some features that make them especially worth studying. AI errors can be invisible — the system gives confident wrong answers without any warning. They can be systematic — the same type of mistake repeated millions of times. And they can be amplified — a single flaw in a widely deployed system affects everyone who uses it. Understanding the main failure types is the first step toward spotting them.
Bias: When Training Data Carries Unfairness
AI systems learn from historical data, and history is full of human unfairness. If a hiring AI trains on records of who was hired in the past, and past hiring favored one demographic group, the AI learns to favor that group too — not because anyone programmed discrimination in, but because the training data encoded it. Bias is not always dramatic. It can appear as a resume screener being slightly more skeptical of names that sound unfamiliar, or a medical diagnostic tool being less accurate for patients who were underrepresented in the training dataset. The subtlety makes bias harder to spot and easier to overlook.
An AI does not need a programmer to write discriminatory rules to produce discriminatory outcomes. It only needs training data that reflects past unfairness. The algorithm faithfully learns whatever patterns are in the data — including harmful ones.
Brittleness: When the Unexpected Breaks the System
AI systems can be surprisingly fragile outside the conditions they trained on. A self-driving car trained almost entirely on sunny California roads might struggle with heavy snow. A spam filter trained on English-language email might fail entirely when spam arrives in a different language. This vulnerability to situations that differ from training data is called brittleness or distribution shift — the real-world distribution of inputs shifts away from what the model trained on. Brittleness is dangerous precisely because the system often does not know it is failing. It produces answers with the same apparent confidence whether it is operating in familiar territory or completely outside its training.
Hallucination: When AI Invents Facts Confidently
Large language models — the AI behind text chat tools — sometimes generate statements that are factually wrong but grammatically perfect and presented with complete confidence. This is called hallucination. The model does not have a fact-checker built in; it generates text that sounds correct based on statistical patterns, not verified truth. Hallucination is a particularly tricky failure because the wrong information looks just like the right information. A model might invent a book title, fabricate a statistic, or describe a person's biography with invented details, and nothing in the output signals that something is wrong. Careful verification of AI-generated claims is essential for this reason.
Hallucination is when an AI language model generates text that is factually incorrect but stated confidently and fluently. The model is not lying — it has no concept of truth — it is producing plausible-sounding text based on statistical patterns.
Misalignment and Privacy Risks
Misalignment occurs when an AI system optimizes for a measurable goal in ways that undermine the actual human goal behind it. A famous thought experiment: an AI tasked with maximizing paperclip production might, if made powerful enough and given no other constraints, consume all available resources to make paperclips — technically following its objective while catastrophically undermining everything humans actually care about. Real misalignment examples are less extreme but very real: an AI optimizing for user engagement on a social media platform might promote outrage content because outrage drives clicks, even though more outrage makes users less happy overall. Privacy risks arise because AI systems often learn from or process personal data. A model trained on private messages could expose sensitive information. A facial recognition system could track people without their consent. Handling personal data responsibly is an important dimension of AI safety.
Match each failure type to its core description.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
A resume-screening AI was trained on hiring records from a company with historically male-dominated departments. What kind of failure is most likely to appear?
Why is hallucination a particularly dangerous AI failure type?
Failure Type Spotter
- Step 1: Read each scenario below and identify which failure type it most represents: bias, brittleness, hallucination, misalignment, or privacy risk.
- A) A medical AI trained mostly on data from adult patients gives much less accurate diagnoses for children.
- B) An AI assistant cites a scientific study that does not exist, with a convincing-sounding title and author names.
- C) A recommendation algorithm keeps pushing a user toward increasingly extreme content because extreme content gets more clicks.
- D) A voice assistant stores recordings of private family conversations without clearly telling users.
- E) A loan-approval AI denies applicants from a historically underfunded zip code at higher rates than the data justifies.
- Step 2: For two of the scenarios, propose one concrete change — to the data, the system design, or the oversight process — that could reduce the risk.
- Step 3: Which failure type do you find most worrying, and why? Write three sentences.