Skip to main content
AI Safety, Alignment & Ethics

⏱ About 15 min15 XP

Human Values Are Complicated

If someone asked you to write down everything you value, you might start easily: kindness, honesty, fairness, health, freedom, friendship. But very quickly you would run into problems. What happens when honesty hurts a friend's feelings? What happens when one person's freedom restricts another's? What does fairness mean when two people with different histories are competing for the same opportunity? Human values are not a tidy list. They are a web of priorities, trade-offs, and context-dependent judgments that humans navigate largely without thinking about them explicitly. Teaching that web to an AI is one of the hardest problems in alignment.

Values Change With Context

Consider honesty. You value it. But if a friend asks if their terrible haircut looks good the morning of a job interview, you might say it looks fine. If a doctor asks if you have been taking your medication, honesty matters more. If your friend asks whether you are planning their surprise party, you might actively lie. The value itself has not changed, but how it applies depends entirely on context. This context-dependence is incredibly hard to encode in a formal system. A rule like always tell the truth does not capture how humans actually use honesty. Neither does tell the truth most of the time. The right answer depends on relationships, stakes, intentions, and dozens of other factors that shift moment to moment.

Values Are Context-Dependent

The same value, applied to different situations, calls for different actions. Human moral judgment constantly weighs competing values against each other and adjusts based on circumstances. This flexibility is part of what makes human ethics sophisticated — and part of what makes it hard to formalize.

Or consider fairness. If two students take the same test and one scored higher, is it fair to give them the same grade? Most people say no. But if one student had a severe illness during exam season and the other did not, is it fair to compare their scores directly? Now most people say no in the other direction. Fairness is not a fixed formula. It requires understanding history, context, power, and circumstance.

Values Conflict With Each Other

Many of the hardest decisions people face involve choosing between things they value, not between something good and something obviously bad. Privacy vs. safety: Should a school install cameras in every hallway to prevent bullying, or is the loss of privacy too high a cost? Efficiency vs. fairness: Should a hiring algorithm choose the candidate who is most likely to perform well on average, even if that systematically disadvantages historically excluded groups? Individual freedom vs. collective well-being: Should people be free to make choices that harm only themselves? What about choices that have small but real costs for others? AI systems that are given a single value to optimize for will inevitably shortchange the other values. An AI optimizing purely for safety might restrict freedom so severely that life becomes intolerable. An AI optimizing purely for individual preferences might be indifferent to effects on others. Real human wisdom involves balancing values, not maximizing just one.

Value Trade-offs

Human values frequently conflict. Good judgment is not about knowing which value is always most important but about weighing competing values thoughtfully in each specific situation. Formalizing this into a rule an AI can follow without exception is, at present, not possible.

Values Differ Across People and Cultures

Even among people who broadly share values, the details differ. Two people who both value fairness might disagree sharply on what a fair tax system looks like. Two people who both value respect for elders might disagree on what that respect requires. Across cultures, societies organize some values quite differently. Individualism, community, religious obligation, personal honor, and social harmony have different relative weights in different traditions. An AI built on one cultural framework and deployed globally would encode that framework into decisions affecting people with very different values. Whose values should govern? This is not a question AI research alone can answer. It is a question that must involve broad, diverse human participation. This does not mean all values are equally valid or that we cannot make progress. It means value specification requires humility, ongoing dialogue, and mechanisms for many voices to shape what AI systems are designed to optimize.

Flashcards — click each card to reveal the answer

Match each challenge in specifying human values to the correct description.

Terms

Values are context-dependent
Values conflict with each other
Values differ across cultures
Values include unspoken intuitions

Definitions

Much of what people care about has never been put into explicit rules
Maximizing one valued goal often means sacrificing another
Different communities assign different weights to the same principles
The same principle calls for different actions in different situations

Drag terms onto their definitions, or click a term then click a definition to match.

Why is it insufficient to give an AI the instruction 'always be honest'?

Why does value pluralism across cultures matter for AI alignment?

Map a Value Conflict

  1. Choose a real situation where two values you hold come into conflict. It can be personal, or one of these prompts:
  2. A: A good friend asks for your honest opinion of their creative project before they submit it. You think it is not very good.
  3. B: Your community wants to build a park on land owned by a family who does not want to sell.
  4. C: An employer wants to track employee computers to prevent wasted time, but employees feel their privacy is being violated.
  5. Step 1: Name both values in conflict.
  6. Step 2: Explain why each value genuinely matters in this situation.
  7. Step 3: Describe how you would try to balance them.
  8. Step 4: Now imagine programming an AI to make this decision. Write down the rule you would give it. Then find the flaw in your rule.