Skip to main content
Frontier & Future AI

⏱ About 15 min15 XP

How Fast Is AI Moving?

Ask ten people how fast AI is advancing and you will get ten different answers. A venture capitalist might say we are living through the most rapid technological acceleration in history. A skeptical academic might say the big ideas dried up years ago and we are just throwing more compute at old algorithms. A journalist might say AI will replace every job in five years. A working AI researcher might say the hard problems are harder than they look and progress is slower than the headlines imply. All of them are looking at the same field. So why do they disagree so sharply? Because AI progress is real, rapid in some dimensions, misleadingly slow in others, and remarkably easy to misread if you only look at one kind of evidence.

What the Numbers Actually Show

On the benchmarks researchers use to measure AI capability, progress has been astonishing. In 2012, the best image recognition systems made errors about 26 percent of the time. By 2017, the best systems were outperforming the average human on the same benchmark. In language understanding, a benchmark called GLUE — introduced in 2018 — was considered a difficult challenge for AI. Systems reached human-level performance on it within two years. Researchers had to create harder benchmarks just to have room to measure further progress. The compute used to train leading AI models has roughly doubled every six to twelve months — a pace faster than the famous Moore's Law trajectory for raw chip transistor counts. Costs to perform standard AI tasks have fallen by orders of magnitude. Tasks that cost millions of dollars to automate in 2016 cost thousands in 2023 and less than a hundred dollars in 2025.

When Benchmarks Run Out

When AI systems start consistently outperforming humans on a benchmark, researchers retire it and create a harder one. This is genuinely good news — it means progress is real. But it also means headline comparisons between years are tricky: the bar keeps moving.

The Gap Between Benchmark and Reality

Benchmark performance and real-world usefulness do not always move together. An AI model might achieve 95 percent accuracy on a carefully curated test set and then fall apart when encountering the messy, ambiguous, contradictory reality of actual human tasks. This is called the benchmark-reality gap. Consider autonomous driving. AI-powered vehicles achieved impressive performance on controlled test tracks years ago. Deploying them on actual roads — with unexpected construction, unusual weather, cyclists behaving erratically, and situations their training data never anticipated — proved far harder. The gap between test performance and reliable real-world behavior has kept full autonomous driving from widespread deployment for over a decade despite constant progress. This does not mean benchmark progress is meaningless. It means benchmark progress is a leading indicator — it signals that something is getting better in a useful direction, but it does not guarantee the real-world problem is solved.

S-Curves: How Technological Progress Really Works

Most technologies follow an S-curve of development. Progress starts slowly as researchers figure out the basic approach. Then it accelerates — sometimes dramatically — as the approach matures, investment flows in, and talented people pile in. Then it decelerates as the approach hits fundamental limits, and the next S-curve begins with a new approach. AI has followed multiple S-curves. Each major approach — symbolic AI, expert systems, neural networks, deep learning, large language models — followed this pattern. We are currently on a steep portion of the large language model S-curve. Whether that curve is still in its accelerating phase, approaching a plateau, or about to give way to a new curve is genuinely contested among experts. Understanding the S-curve concept helps you read AI news more accurately. A slowdown in progress does not mean AI is failing — it may mean one approach is maturing and a new one is waiting to emerge.

What Humans Remain Better At

Despite rapid benchmark gains, AI systems in 2025 still fall short of human performance in important ways. Humans can learn a new task from one or two examples; AI models typically need thousands. Humans can seamlessly apply knowledge from one domain to a completely novel situation; AI often fails when inputs deviate from training distribution. Humans can explain their reasoning transparently and update their beliefs when corrected; AI systems can confabulate — generate confident-sounding explanations that are factually wrong. Humans also maintain agency and intention — we have goals we form ourselves and pursue flexibly over time. Current AI systems do not have goals in this sense; they respond to inputs based on learned patterns. This is a fundamental architectural difference, not just a matter of more scale or better data.

Match each concept to the correct description of what it reveals about AI progress.

Terms

Benchmark saturation
Benchmark-reality gap
S-curve
Confabulation
Few-shot learning gap

Definitions

When AI reaches human-level on a test, researchers retire it and create a harder one to keep measuring progress
High test-set accuracy does not guarantee strong performance on the messy, unpredictable real world
Humans can learn from one or two examples while AI typically needs thousands, revealing a real difference in generalization
The pattern where a technology accelerates rapidly then slows as it approaches limits, before a new approach takes over
When an AI system generates a confident-sounding explanation that is factually incorrect

Drag terms onto their definitions, or click a term then click a definition to match.

What is the 'benchmark-reality gap' in AI?

According to the S-curve model of technological progress, what typically happens after a period of rapid AI advancement?

Progress or Hype? Evaluate a Claim

  1. Find one recent AI headline — from a news site, a social media post, or something a friend told you — or use one of these examples: 'AI beats humans at coding interview,' 'AI will replace lawyers within three years,' 'New AI model achieves 90% accuracy on medical diagnosis.'
  2. Step 1: Write out the claim clearly.
  3. Step 2: Ask: Is this a benchmark result or a real-world deployment result? How do you know?
  4. Step 3: Ask: Does this claim tell you about a specific narrow task or about general AI ability?
  5. Step 4: Ask: Who made this claim and what might their incentive be to exaggerate or understate?
  6. Step 5: Write a one-paragraph assessment of whether the claim represents a genuine breakthrough, a genuine improvement, or an overstatement — and explain your reasoning.