Skip to main content
Frontier & Future AI

⏱ About 15 min15 XP

What Makes AI More Capable?

For most of computing history, making software better meant writing better code. A programmer would study the problem, think of a smarter algorithm, and update the program. AI systems can improve this way too — but over the past decade, researchers discovered that capability in AI often grows along entirely different dimensions: more data, more computation, and smarter training methods. Understanding those dimensions is the first step toward understanding why today's AI systems do things that seemed impossible just a few years ago.

Dimension 1 — More Data

AI systems that learn from examples get smarter when they see more examples. An image classifier trained on one thousand photos of cats will make more mistakes than one trained on one hundred million. A language model trained on a few books will have a much narrower understanding of the world than one trained on hundreds of billions of words from the internet, books, scientific papers, and code. More data helps in two ways. First, it exposes the model to a wider variety of situations, so it generalizes better. Second, it reduces the chance that the model overfits — memorizing the quirks of a small dataset instead of learning the underlying pattern. The internet produced a historic windfall of data for AI: text, images, audio, video, and code at a scale no previous generation of researchers had access to.

Generalization

Generalization is a model's ability to perform well on new examples it was never trained on. More diverse training data almost always improves generalization.

Dimension 2 — More Computation

Training a large AI model is extraordinarily computation-intensive. The algorithm must process millions or billions of examples, adjust billions of numerical parameters, and repeat this process thousands of times. Every adjustment requires multiplying and adding enormous arrays of numbers. Specialized chips called GPUs (graphics processing units) and TPUs (tensor processing units) made this practical. Originally designed for rendering video game graphics, GPUs turned out to be ideal for the matrix mathematics at the heart of neural network training. A training run that would take a single computer decades can now be completed in days by a cluster of thousands of GPUs working in parallel. As chips improved and became more available, researchers could train bigger models faster — and bigger models tend to be more capable.

Compute as a Resource

Compute — the total amount of mathematical operations performed during training — is measured in units called FLOPS (floating-point operations). Modern large model training runs consume more compute than previous generations of AI research used in their entire history.

Dimension 3 — Better Algorithms and Architectures

Even with the same data and compute, a smarter algorithm can produce a far more capable model. The transformer architecture, introduced in 2017, unlocked a new level of language understanding by letting the model pay selective attention to different parts of its input — a mechanism called self-attention. Before transformers, language models struggled to link words across long passages. After transformers, they could reason across entire documents. Beyond architecture, researchers developed smarter training methods: better ways to initialize the billions of parameters, more effective loss functions, and techniques like reinforcement learning from human feedback (RLHF) that teach a model to be helpful and accurate rather than just grammatically fluent.

When Capability Jumps Unexpectedly

One of the most surprising discoveries in recent AI research is that capability does not always grow smoothly. Sometimes a model reaches a certain size and suddenly develops an ability it did not seem to have before — performing multi-step arithmetic, writing working code, or translating languages it was barely exposed to during training. Researchers call these emergent abilities. Emergent abilities are not fully understood. They suggest that sheer scale can unlock qualitatively new behaviors, not just better performance on existing tasks. This makes forecasting AI capability difficult: we cannot always predict in advance which new skill a bigger model will develop.

Emergent Abilities Are Not Magic

The word emergent sometimes implies a mysterious leap. In reality, emergent abilities almost certainly arise because the model has learned enough underlying sub-skills to combine them in new ways. The behavior is surprising, but it is still rooted in pattern learning — not genuine understanding or consciousness.

Match each capability dimension to what it describes.

Terms

More training data
More compute
Transformer architecture
Emergent ability
Generalization

Definitions

Performing well on new examples that were not in the training set
A skill a model suddenly shows after crossing a certain size threshold
Lets a model pay selective attention to different parts of its input
Allows training of larger models with more parameters in less time
Exposes the model to more variation so it generalizes to new examples

Drag terms onto their definitions, or click a term then click a definition to match.

Which of the following best explains why more training data usually makes an AI model more capable?

A language model trained at a very large scale suddenly begins solving algebra problems, even though algebra was not a focus of its training data. What term describes this phenomenon?

Map the Dimensions of Capability

  1. Step 1: Think of a human skill that improves with practice — for example, playing a musical instrument, speaking a second language, or solving math problems.
  2. Step 2: Identify what the human equivalent of 'more data' would be for that skill.
  3. Step 3: Identify what the human equivalent of 'more compute' would be.
  4. Step 4: Identify a technique change (like a better practice method or a new strategy) that might be the equivalent of a better algorithm.
  5. Step 5: Write two sentences comparing how humans develop capability with how AI systems develop capability. Note one important similarity and one important difference.