The Long-Term Future
At some point in thinking about AI's trajectory, the horizon extends far enough that ordinary reasoning becomes strained. What happens if AI capabilities continue advancing not for years but for decades? What if systems become genuinely superintelligent — capable not just of matching humans but of far exceeding human cognitive ability in every domain? What are the realistic ranges of outcomes, and how should we reason about possibilities that are genuinely uncertain, potentially extremely consequential, and difficult to evaluate? This lesson engages with long-term thinking carefully and honestly — neither dismissing these questions as science fiction nor treating speculative scenarios as established predictions.
Superintelligence: The Concept and the Debates
The concept of superintelligence — an AI system whose cognitive capabilities significantly exceed the best human performance across virtually all domains — has been the subject of serious philosophical and technical analysis for decades. Nick Bostrom's 2014 book 'Superintelligence: Paths, Dangers, Strategies' brought the concept to broad academic and policy attention, arguing that a sufficiently superintelligent system would have enormous strategic advantages and that ensuring it has aligned values is among the most important problems humanity faces. The core concern is a convergence argument: an agent pursuing almost any goal will, if sufficiently capable, develop subgoals that include self-preservation, resource acquisition, and resistance to modification — because these instrumental subgoals are useful for achieving almost any terminal goal. A sufficiently capable agent pursuing an innocuous objective could therefore still behave in ways that are dangerous if its capabilities allow it to resist correction or modification. This is sometimes called the 'paperclip maximizer' thought experiment: an AI tasked with manufacturing paperclips, if sufficiently capable, might convert all available matter into paperclips, including matter that humans would prefer to keep in other forms. Important counterarguments and skepticism exist. Many AI researchers argue that the path from current systems to genuine superintelligence is far less direct than the theoretical arguments assume, that intelligence is not a single dimension that scales uniformly, and that the convergence argument relies on assumptions about optimization pressure that may not apply to real systems. Others argue that the concern is valid in principle but that near-term safety and fairness issues deserve more immediate attention than long-run existential scenarios.
The convergence argument holds that almost any AI system with a fixed goal will, if capable enough, develop subgoals including self-preservation and resource acquisition — because these help achieve almost any goal. This is not a claim about AI having human-like desires; it is a claim about the structure of optimization under capability. Whether this applies to real AI systems is actively debated.
Existential Risk: Reasoning About Low-Probability, High-Impact Events
Some researchers argue that sufficiently capable misaligned AI poses an existential risk — a risk not just of large-scale harm but of permanently curtailing humanity's long-term potential. The concept of existential risk was developed rigorously by philosopher Nick Bostrom and has become the focus of a serious research community. Reasoning about existential risk requires confronting a particular kind of difficulty: the most extreme outcomes — those that would matter most — are also the hardest to reason about empirically, because we have no historical precedent for humanity-scale catastrophe caused by technological development (and no examples of recovery from such events if they occurred). This creates a dilemma: because we cannot calibrate our estimates from history, they are highly uncertain; because the stakes are enormous, even very small probabilities may warrant attention. The expected value argument: if an event has even a very low probability (say, 0.1%) but an astronomically large negative consequence (the permanent foreclosure of humanity's long-run future), its expected cost may be very large — potentially larger than many events with high probability and moderate cost. This argument motivates work on AI safety even among people who believe near-term AGI is unlikely. Counterarguments: critics argue that expected value calculations break down at extreme scales — that assigning probability numbers to unprecedented events is itself suspect, and that reasoning under model uncertainty should make us skeptical of recommendations to prioritize low-probability extreme outcomes over higher-probability moderate harms. Others argue that the 'existential risk' framing, by focusing on long-run AI, can divert attention from immediate harms caused by current AI systems.
Alignment Research as a Response
AI alignment research is the field that studies how to ensure AI systems reliably pursue the goals their designers intend, even as those systems become more capable. The alignment problem is not simply a technical problem of implementation — it is a deep problem about how to specify human values precisely, how to verify that a system has the intended values, and how to maintain human oversight of systems that may become more capable than their designers. Current alignment research includes approaches such as reinforcement learning from human feedback (RLHF), which trains AI systems using human evaluations of outputs; interpretability research, which tries to understand what is happening inside AI systems by analyzing their internal representations; and scalable oversight, which studies how to maintain meaningful human oversight of AI systems that may exceed human ability in certain domains. The alignment challenge is not universally accepted as the central problem. Many AI researchers prioritize near-term safety issues — bias, reliability, privacy, misuse — over long-run alignment concerns. The field has ongoing debates about research priorities, methodologies, and how much weight to give to various risk scenarios. A responsible student of AI understands that these debates exist and is skeptical of anyone who presents one view as the obvious consensus.
Flashcards — click each card to reveal the answer
Thinking Carefully About the Far Horizon
Long-term thinking about AI requires intellectual virtues that are difficult to maintain in practice. Calibrated uncertainty at extreme ranges: it is tempting to be either dismissive ('superintelligence is science fiction') or alarmist ('AI will definitely destroy civilization within twenty years'). Both are failures of calibration. The intellectually honest position acknowledges that the uncertainties are genuine and large, that serious researchers hold a range of views for serious reasons, and that your own estimate should reflect your evidence rather than your emotional reaction. Distinguishing philosophical thought experiments from policy recommendations: the 'paperclip maximizer' and similar thought experiments are useful for illuminating conceptual points, not for making precise predictions about real systems. When philosophical arguments are used to justify specific near-term policy positions, the connection between the argument and the policy should be examined critically. Avoiding galaxy-brained reasoning: when chains of plausible-seeming reasoning lead to conclusions that strike most thoughtful people as extreme, this should raise a flag. The fact that an argument seems valid does not guarantee that its premises are accurate or that its conclusion follows without assumptions that deserve scrutiny. Long-horizon AI reasoning is particularly prone to this failure mode. Attending to near-term harms: long-run thinking must not displace attention from harms happening right now — AI systems perpetuating bias in hiring and lending, AI-generated disinformation destabilizing elections, surveillance infrastructure being used against marginalized communities. A complete perspective on AI's future attends to both the immediate and the long-run horizon.
The convergence argument holds that a capable AI system pursuing almost any goal will develop subgoals including self-preservation. Which of the following best states the strongest objection to this argument?
A researcher argues that working on long-run AI existential risk is the most important priority because even a 0.1% chance of catastrophe has enormous expected cost. Which is the strongest critique of this expected-value argument?
Long-Horizon Reasoning Exercise
- This activity develops the intellectual virtues needed for careful long-run thinking.
- Step 1: Choose one long-run AI scenario: (a) a superintelligence that becomes misaligned and difficult to correct, (b) a world where AI enables a small group to gain irreversible global dominance, or (c) a world where AI and humans flourish together with widely shared benefits.
- Step 2: Write the most compelling case for why this scenario is possible, grounded in specific mechanisms rather than vague claims.
- Step 3: Write the most compelling case for why this scenario is unlikely or avoidable, again grounded in specific mechanisms.
- Step 4: Identify the two most important empirical questions whose answers would most change your assessment of the scenario's probability.
- Step 5: Assign a probability estimate to the scenario occurring within the next 100 years. Write one sentence explaining what most drives your estimate.
- Reflect: After completing steps 2 and 3, did your probability estimate change from your initial intuition? If so, which argument moved you most, and why?