Why Agents Need Memory
Every time you send a message to a language model, the model starts from scratch. It has no recollection of the conversation you had yesterday, no awareness of what it computed ten seconds ago in a different API call, and no running sense of where a task stands. This is not a bug — it is a deliberate architectural property of transformer-based models. The same stateless design that makes models cheap to scale and easy to replicate also makes them fundamentally amnesiac at the API boundary.
A single-turn interaction — ask a question, receive an answer, done — works fine in this world. But an AI agent is something more ambitious: a system that pursues a goal over many steps, using tools, accumulating information, and revising its plan as results come back. A coding agent that debugs a file must remember what errors it already tried to fix. A research agent summarizing twenty papers must track which papers it has read and what the emerging argument is. A customer-service agent handling a multi-day return must remember what the customer said on Monday before responding on Wednesday. Without memory, none of these agents can function.
A language model processes one context window per API call and retains nothing afterward. All apparent continuity in a chatbot or agent comes from code outside the model — either replaying the conversation history in each prompt, or storing and retrieving information from an external system.
What Goes Wrong Without Memory
Consider a coding agent whose job is to refactor a large Python codebase. It identifies a problem in file A, fixes it, then calls the model again to handle file B. If the second call does not include what was done in file A, the model will not know to maintain consistency with those changes. It may even reintroduce the same bug. Then in file C it might contradict both prior decisions. The agent is taking actions in the world but cannot reason about its own history — it is flying blind. This failure mode is sometimes called context amnesia. It is one of the most common reasons early agent prototypes produce incoherent results despite each individual model response looking reasonable in isolation. The model is never confused — but the agent as a whole is, because the agent's state is scattered and untracked.
When an agent calls a model multiple times without carefully managing what information is passed in each call, the model appears to 'forget' earlier work. The model itself is blameless — it never had that information. The failure belongs to the agent's memory architecture.
Three Kinds of Information Agents Need to Track
Thinking clearly about agent memory begins with classifying what actually needs to be remembered. Agents typically need to track three distinct categories of information. First, conversational or task history: what the user asked, what steps the agent has already taken, and what results those steps produced. Without this, the agent cannot build on its own prior work. Second, world state: facts about the environment that the agent has discovered — which files exist, what the database schema looks like, what a web page currently says. This information was retrieved at some point and must persist so the agent does not re-fetch it or act on stale data. Third, intermediate computations: partial results from complex multi-step reasoning, for example the parsed structure of a document partway through analysis, or a running score while evaluating candidates. Losing this mid-task forces the agent to restart.
Match each memory failure to the category of information the agent lost track of.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Memory as Infrastructure
Professional agent frameworks treat memory as explicit infrastructure, not an afterthought. LangChain, LlamaIndex, AutoGen, and similar systems all include memory modules as first-class components alongside tool calling and planning. When engineers design an agent, they ask: what information must persist, for how long, in what format, and how will the model retrieve it at the right moment? These are engineering decisions with real tradeoffs. Storing everything in the context window is simple but hits size limits quickly. Storing everything in a database gives unlimited capacity but requires retrieval logic. Summarizing history saves space but may lose detail. The rest of this module explores each of these strategies in depth. But the starting point is this lesson's key insight: the model cannot remember for the agent. The agent — the code wrapping the model — must build and maintain memory deliberately.
A travel-booking agent is handling a multi-step task: find flights, check hotel availability, compare prices, and present options. After calling the model to find flights, it calls the model again to check hotels. The second call returns results that contradict the first. What is the most likely cause?
Which of the following agent designs would NOT suffer from context amnesia?
Flashcards — click each card to reveal the answer
Map a Failing Agent
- Scenario: An AI research agent is tasked with writing a literature review on climate change adaptation. It makes five model calls: (1) search for papers, (2) summarize paper A, (3) summarize paper B, (4) identify common themes, (5) draft the introduction.
- Step 1: Draw a diagram showing these five calls as boxes. Draw arrows showing what information each call passes to the next.
- Step 2: Now assume the agent passes NO information between calls — each call starts fresh. Mark every point in the diagram where a failure occurs and describe what goes wrong in one sentence.
- Step 3: Redesign the information flow. For each call, specify exactly what prior information must be included in its prompt to prevent the failures you identified.
- Step 4: Reflect: how much larger does the prompt grow by the fifth call? What does that suggest about the cost of naive memory (replaying everything)?