Safety Stakes Sort
You have now studied the frameworks. You know that stakes are determined by severity, reversibility, and scale. You know that higher stakes demand stronger safeguards. Now it is time to put that reasoning into practice. This lesson is built around active thinking — sorting, debating, and defending your reasoning. Being able to explain why an AI application is high or low stakes is a skill that matters in the real world, whether you become an engineer, a policymaker, a journalist, or a citizen trying to vote wisely on AI regulation.
Reviewing the Stakes Framework
Before you sort, let us sharpen the framework. Three questions assess stakes: Severity: What is the worst realistic outcome if this AI makes an error? Minor annoyance? Financial loss? Physical injury? Loss of life? Reversibility: Can the harm be undone? A misspelled autocorrect can be corrected in seconds. A wrongful criminal conviction takes years to reverse — if it is reversed at all. Scale: How many people are exposed to this AI's outputs? A tool used by one person is different from a tool embedded in a service used by a billion people. None of these questions alone determines stakes — you need to weigh all three together. A low-severity harm at massive scale can be as significant as a high-severity harm affecting a few people.
Some AI applications look low-stakes on the surface but have high-stakes implications. A spell-checker seems trivial, but a spell-checker built into medical documentation software used by thousands of doctors could introduce systematic errors into patient records. Always ask about the context, not just the tool.
The Stakes Sort — Part 1: Solo Reasoning
- Work through each AI application below. For each one, rate it 1-5 on severity, 1-5 on reversibility of harm (5 = very hard to reverse), and 1-5 on scale. Add the three scores to get a total stakes score out of 15. Then write one sentence justifying your rating.
- Application A: An AI that automatically captions photos for a personal social media account.
- Application B: An AI that recommends which neighborhoods receive more frequent police patrols based on historical crime data.
- Application C: An AI that detects wildfires from satellite images to trigger evacuation alerts.
- Application D: An AI that decides which foster children are matched with which foster families.
- Application E: An AI that generates personalized math practice problems for a student based on their quiz results.
- Application F: An AI that screens resumes for a large employer, deciding which applicants a human recruiter ever sees.
- Application G: An AI that controls the air traffic management system for a major international airport.
- Application H: An AI that suggests the next word when you are composing a text message.
- After scoring all eight, rank them from highest to lowest total stakes score.
Thinking About Safeguards
Stakes scores are useful, but the goal is not just to label applications — it is to decide what level of safeguard is warranted. A rule of thumb used by many AI governance frameworks is this: the safeguard burden should match the stakes level. For low-stakes applications (scores roughly 3-6), basic quality testing and a user feedback channel may be sufficient. For medium-stakes applications (scores roughly 7-10), add independent testing, documented limitations, and human review for edge cases. For high-stakes applications (scores roughly 11-15), require rigorous independent auditing, mandatory human oversight on final decisions, clear accountability structures, and appeal processes for affected individuals.
For high-stakes AI applications, a human-in-the-loop requirement means that a human being must review and approve any consequential decision — the AI assists, but a human remains responsible for the final outcome. This preserves accountability and allows human judgment to catch AI errors before they cause irreversible harm.
The Stakes Sort — Part 2: Group Debate
- Form a group of three to four students. Each person shares their rankings from Part 1.
- Step 1: Find the two applications where your group's rankings differ the most. These are your debate cases.
- Step 2: For each debate case, take turns making a case for your rating. Use the three-factor framework (severity, reversibility, scale) to support your argument. Explain which factor you weighted most heavily and why.
- Step 3: After hearing all arguments, vote on a group consensus rating. Write one paragraph explaining the group's final reasoning.
- Step 4: For the two applications your group rated highest (most dangerous), agree on three specific safeguards that should be required before the application is deployed. Be concrete: not just 'have human oversight' but 'a licensed social worker must review and approve every AI foster family match before it is finalized.'
- Step 5: Report your two highest-stakes applications and their required safeguards to the class. Compare across groups — did different groups agree on which applications are most dangerous?
An AI system predicts student reading level from a brief assessment and places students into ability groups. A student is placed in the lowest group, which limits the books and projects they have access to for the whole year. How should this application's stakes be assessed?
Two AI applications each make ten errors per thousand interactions. Application X serves a restaurant menu recommendation feature. Application Y assists emergency dispatchers in routing ambulances. Which has higher stakes and why?