Reasoning Out Loud
In school, a math teacher does not just write the answer on the board — they work through the problem step by step, showing every calculation and explaining each decision. This is not just for the students' benefit. Working problems out loud also helps the teacher catch their own mistakes. AI agents benefit from the same discipline: when they reason step by step before acting, they produce better answers and make mistakes that are easier to find and fix.
Why Step-by-Step Reasoning Works
When a language model jumps directly from a question to an answer, it compresses a complex process into a single move. Any errors in that compressed process are hidden — there are no intermediate steps to check. When the same model is asked to reason step by step, each intermediate conclusion is written down explicitly. This has two benefits. First, each step can be evaluated on its own: is this intermediate conclusion correct? Second, each step constrains the next step. A correct Step 2 is easier to reach when Step 1 is already verified. The chain of reasoning builds on itself, reducing the cumulative error that would appear when all steps are invisible. Researchers have found that asking language models to 'think step by step' — or using structured prompts that elicit intermediate reasoning — dramatically improves performance on multi-step problems like math, logic, and planning.
Writing intermediate reasoning steps forces a model to work through a problem sequentially. Each step is checkable, each step constrains the next, and errors become visible rather than hidden inside a single compressed answer.
Reading a Chain-of-Thought Trace
A chain-of-thought trace is the written record of an agent's step-by-step reasoning before it acts. Here is an example of an agent reasoning through a task: 'The user wants to know which country has the most UNESCO World Heritage Sites. Let me think through this. I know France and China are often mentioned as top countries for heritage sites, but I should not rely on my memory alone for a specific count — these numbers change as new sites are added. I will search for the current UNESCO list and count the top entries. My action: run a web search for current UNESCO World Heritage Sites count by country.' Notice the structure: the agent states what it knows, acknowledges what it does not know, decides what information it needs, and then chooses an action. This is very different from the agent simply outputting 'I will search UNESCO World Heritage Sites' with no explanation.
A chain-of-thought trace is the step-by-step written reasoning an agent produces before selecting an action. Reading these traces lets designers and users understand why the agent made a specific decision.
What Makes Reasoning Trustworthy?
Not every chain-of-thought trace is trustworthy. A model can generate a reasoning chain that sounds rigorous but contains a subtle error, a false assumption, or a logical leap that does not hold. Three qualities distinguish trustworthy reasoning from plausible-sounding but flawed reasoning. Groundedness: does each step follow from the previous one and from actual evidence, rather than from assumption? A step that says 'I assume X without checking' is not grounded. Calibrated uncertainty: does the agent acknowledge when it does not know something, rather than asserting everything with equal confidence? Good reasoning flags its own uncertain parts. Verifiability: can the reasoning steps be checked against external facts or logical rules? Steps that are vague or circular are harder to verify and therefore less trustworthy.
Flashcards — click each card to reveal the answer
Why does reasoning step by step improve an agent's accuracy compared to answering immediately?
An agent reasons: 'France has the most UNESCO World Heritage Sites, so I will write that in my answer.' What quality of good reasoning does this violate?
Write a Reasoning Trace
- Step 1: An agent receives this task: 'I have 47 dollars. I want to buy the most books possible from an online store where paperbacks cost $8.99 each and shipping is $3.50 for the whole order. How many books can I buy?'
- Step 2: Write a full chain-of-thought reasoning trace for the agent — at least five explicit steps. Show every intermediate calculation or logical conclusion. Do NOT jump to the answer in the first step.
- Step 3: At the end of your trace, write the action the agent should take (in this case, produce the final answer).
- Step 4: Swap your trace with a partner (or review it yourself) and check it for grounded reasoning, calibrated uncertainty, and verifiability. Mark any step that fails one of these qualities.
- Step 5: What would the trace look like if the agent had not reasoned step by step — just jumped to an answer? Write that version too and compare.