The Current AI Paradigm
If you were to open a major AI research paper today from Google DeepMind, OpenAI, Meta AI, or Anthropic, you would find the same fundamental approach repeated across nearly every system: a large neural network trained on an enormous dataset using gradient descent, then refined further with human feedback. This is not coincidence. It is a paradigm: a shared set of methods, assumptions, and tools that the field has converged on because it works dramatically better than alternatives. Understanding this paradigm is not about memorizing a recipe. It is about understanding why this specific combination of ideas succeeded when decades of other approaches did not, and what assumptions it rests on so you can reason about its limits.
What Is a Paradigm?
The philosopher of science Thomas Kuhn coined the term paradigm to describe a framework of shared assumptions, methods, and tools that defines normal scientific work within a field. A paradigm is not merely a popular technique. It is the lens through which researchers see problems. When a paradigm dominates, most researchers stop asking whether to use this approach and start asking how to apply it. The current AI paradigm can be stated precisely in four components. First, the architecture: deep neural networks, specifically the transformer introduced in 2017, provide the computational structure. The transformer processes sequences of tokens and captures long-range dependencies via a mechanism called attention. Second, the pretraining objective: models are trained on vast quantities of unlabeled data using a self-supervised objective. For language models, this typically means predicting the next token in a sequence. The model learns rich representations of the world without anyone labeling the data. Third, the scaling hypothesis: more parameters, more data, and more compute consistently produce better models, in ways that follow predictable mathematical laws. This observation, called a scaling law, transforms AI development into an engineering problem with clear returns on investment. Fourth, alignment training: a pretrained model is shaped toward usefulness and safety through fine-tuning on carefully curated examples and through reinforcement learning from human feedback, which adjusts the model to produce outputs humans prefer.
The current AI paradigm rests on four interlocking ideas: transformer architecture, self-supervised pretraining on internet-scale data, scaling laws that make improvement predictable, and alignment training to shape behavior. Every major frontier model including GPT-4, Claude, Gemini, and Llama is built on this foundation.
Before this paradigm crystallized, AI researchers pursued diverse approaches. Expert systems encoded human knowledge as explicit rules, but were brittle and expensive to maintain. Symbolic AI reasoned over formal logic but could not handle the ambiguity of real-world language and perception. Earlier neural networks were constrained by insufficient data, insufficient compute, and architectural limitations that prevented them from capturing long-range structure. Three developments broke these barriers simultaneously in the 2010s. The availability of internet-scale datasets provided the data. The proliferation of GPUs, originally designed for video games, provided the compute. And deep learning research produced architectures, first convolutional nets and then transformers, that could use that data and compute effectively. These three factors converged and unlocked a rapid series of capability improvements that have continued without interruption. Understanding this history matters because it tells you what the paradigm depends on: data abundance, compute access, and the specific inductive biases of the transformer architecture. Anything that threatens these dependencies is a potential limit on the paradigm's continued success.
Match each component of the current AI paradigm to what it contributes.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
What the Paradigm Assumes
Every paradigm rests on assumptions, and understanding those assumptions separates a practitioner who can reason about failures from one who is simply surprised by them. Assumption one: the data distribution at training time approximates the distribution of inputs the model will face at deployment. A language model trained primarily on English text will represent English well and other languages unevenly. A vision model trained on photos from high-income countries may not generalize to visual contexts in low-income countries. Distribution mismatch is the paradigm's most common practical failure mode. Assumption two: scale is the primary lever. The paradigm assumes that making the model larger, the dataset larger, and the training longer produces meaningful capability improvements. This has been true across four orders of magnitude of compute. Whether it continues is an open empirical question. Assumption three: the training objective proxies the real goal. Predicting the next token in text is not the same as understanding meaning, reasoning correctly, or being helpful and honest. The paradigm works because next-token prediction is a remarkably rich proxy task. But this proxy is imperfect, which is why alignment training is necessary. Assumption four: human feedback provides a reliable signal. Reinforcement learning from human feedback assumes that human raters can reliably identify better model outputs. When the task is subtle, such as evaluating the correctness of a mathematical proof or the safety implications of a chemical synthesis, human raters may not be reliable. The quality of alignment training depends on the quality of the human signal.
The current paradigm succeeded because specific historical conditions aligned: internet-scale data, cheap GPU compute, and architectural insight. It assumes those conditions continue and that scale keeps working. Neither assumption is guaranteed. Researchers actively investigate alternative paradigms because the current paradigm has known limits.
A company builds a language model using the current paradigm, training it on English-language legal documents. They then deploy it globally, including in countries where legal systems are based on civil law rather than common law traditions. Which assumption of the paradigm is most directly violated?
Why was predicting the next token chosen as the pretraining objective for large language models, rather than a task with explicit human-provided labels?
Map the Current AI Paradigm
- Work individually, then compare with a partner.
- Step 1: Draw a diagram with four boxes labeled Architecture, Pretraining, Scaling, and Alignment. For each box, write two to three sentences explaining what it is and why it matters.
- Step 2: Draw arrows between the boxes to show dependencies. Which components depend on others? For example, does alignment training depend on having a pretrained model first? Does scaling depend on the architecture?
- Step 3: Add a fifth box labeled Assumptions. List the assumptions from the lesson that the paradigm rests on. Draw arrows from each assumption to the component it most directly affects.
- Step 4: Write a one-paragraph answer to this question: If internet-scale text data suddenly became unavailable, which parts of the paradigm would be most severely disrupted, and why?
- Goal: understand the paradigm as an interdependent system, not a bag of independent techniques.