Reasoning Models
When you work through a hard math problem, you do not just stare at it and instantly blurt an answer. You write down steps, check intermediate results, notice mistakes, backtrack, and try again. For a long time, AI language models did the opposite: they produced answers in one rapid pass, with no visible deliberation. Reasoning models are a newer class of AI designed to slow down and think — generating intermediate steps before committing to a final answer. The results have been striking.
What Makes a Reasoning Model Different
A standard large language model generates text token by token — each word chosen based on everything that came before it in a single forward pass through the network. This works remarkably well for many tasks, but it can fail on problems that require multiple logical steps, because the model has no internal scratchpad to work through the logic. Reasoning models are trained — often using reinforcement learning — to produce a chain of thought: a sequence of intermediate reasoning steps written out before the final answer. The model might think: Let me identify the known quantities. Next, I will apply the formula. Now I can solve for x. This chain of intermediate steps acts like a scratchpad, giving the model a way to catch errors and build on partial results.
Chain-of-thought reasoning is the practice of having a model write out intermediate steps before reaching a conclusion. It dramatically improves performance on multi-step problems in math, logic, science, and code — tasks that require sequential reasoning rather than pattern matching.
How Reasoning Models Are Trained
Training a reasoning model involves more than just scaling up a standard language model. Researchers use reinforcement learning: the model generates a chain of thought and a final answer, and then receives a reward signal based on whether the answer is correct. Over thousands of training examples, the model learns that careful step-by-step reasoning leads to more rewards than jumping straight to an answer. This technique — sometimes called process reward modeling — rewards not just correct final answers but correct intermediate steps. It teaches the model that the quality of reasoning matters, not just the output. Models like OpenAI's o1, o3, and Anthropic's Claude with extended thinking capability were developed using variations of this approach.
Where Reasoning Models Excel
Reasoning models show their biggest advantage on tasks with clear right answers that require sequential logic: competition-level mathematics, complex programming challenges, formal logic puzzles, and multi-step scientific calculations. On these benchmarks, reasoning models substantially outperform equally large standard models. Because they externalize their thinking, reasoning models are also somewhat easier to audit. A teacher or engineer can read the chain of thought and spot where the reasoning went wrong — something impossible with a model that gives only a final answer.
A reasoning model that produces correct step-by-step solutions to calculus problems is not the same as a student who understands calculus. The model is finding patterns that match the form of correct reasoning. On genuinely novel problems outside its training distribution, it may produce plausible-looking but incorrect chains of thought. Checking the work of a reasoning model remains essential.
The Cost of Thinking Longer
Generating hundreds of reasoning steps before an answer takes time and compute. Reasoning models are significantly slower and more expensive to run than standard models for simple tasks. This creates a practical design challenge: use a fast standard model for easy questions, and reserve the reasoning model for hard ones. Many AI products today switch between model types based on the complexity of the request — a process called model routing.
Match each reasoning-model concept to its description.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
What is the main advantage of having a model generate a chain of thought before its final answer?
Why might a reasoning model be slower and more expensive than a standard language model for simple tasks?
Think Like a Reasoning Model
- Step 1: Work through the following problem by writing every step — do not skip to the answer.
- Problem: A school has 360 students. Three-eighths of them play a sport. Of those who play a sport, one-third also play an instrument. How many students play both a sport and an instrument?
- Step 2: After you finish, review your chain of thought. Did you catch any arithmetic slip? Where could you have made an error?
- Step 3: Write two sentences explaining why writing out every step — rather than trying to do it in your head — reduced your chance of making a mistake.
- Step 4: Now describe a task where writing out steps would NOT help much — a task where the answer comes from intuition or pattern recognition rather than sequential logic.