Part 7 · The Core of AI Agents

Chapter 34Memory: Short-Term, Long-Term, and Episodic

⏱ 7 min read·✏️ 6 exercises·The Core of AI Agents

An agent that forgets everything the moment it finishes a step cannot pursue a goal, hold a conversation, or learn from experience. Memory is what gives an agent continuity — the ability to carry context across steps and across sessions. This chapter builds the main kinds of agent memory from the ground up, explains when each is needed, and reveals a satisfying connection: long-term memory for agents turns out to be the retrieval you already learned, applied to the agent's own past. By the end you will know how to give an agent a working memory and a lasting one.

Why Agents Need Memory

Two facts from earlier chapters make memory essential. First, from Chapter 30, the model is stateless — it remembers nothing between calls; everything it knows in a given moment must be in the input you send. Second, from Chapter 12, the context window is finite — there is a hard limit on how much can be in that input at once. Together these mean an agent has no built-in memory and only a small space to hold context. Memory is the set of techniques that work around both limits, letting an agent remember more than fits in the window and more than survives a single call.

Short-Term Memory: The Working Context

Short-term memory is simply what is in the context window right now: the current goal, the recent conversation, and the latest actions and observations from the agent loop. It is the agent's working memory — fast, immediate, and directly available to the model's reasoning. It is also temporary and bounded: when the current task ends or the window fills, it is gone. The history variable from the Chapter 31 agent loop is short-term memory in its rawest form.

The Problem: Short-Term Memory Fills Up

Recall the desk analogy from Chapter 12. As an agent works through a long task or conversation, its short-term memory grows — every Thought, every tool result, every exchange — until it overflows the context window. When that happens, the oldest material falls off the desk and the agent begins to forget. This is the central pressure of agent building, and it forces two responses: managing short-term memory carefully, and adding a longer-term store.

Long-Term Memory: Persisting Beyond the Window

Long-term memory stores information outside the context window and brings it back only when relevant. It survives across sessions — close the conversation, return tomorrow, and the agent still knows what it learned. The mechanism should sound familiar: store pieces of information as embeddings (Chapter 9), and when something relevant comes up, retrieve the closest ones and place them into the context. That is exactly RAG (Chapter 36) applied to the agent's own memory — the same store-and-retrieve machinery, pointed at what the agent has learned rather than at a document library.

Episodic Memory: Remembering Experiences

A particular and valuable flavor of long-term memory is episodic memory — the memory of specific past events and interactions. "Last time this user asked about refunds, I gave them the policy and they were satisfied." "When I tried that search query before, it returned nothing useful." Episodic memory lets an agent learn from its own history: repeating what worked, avoiding what failed, and personalizing to a user over time. It is the difference between an agent that starts fresh every time and one that genuinely improves with experience.

Semantic vs. Episodic Memory

It helps to distinguish two kinds of long-term memory by analogy to human memory. Semantic memory is general facts and knowledge — "Paris is the capital of France," "our return window is 30 days." Episodic memory is specific personal experiences — "this user contacted us last week about a delayed order." An agent often needs both: semantic memory for what is true in general, episodic memory for what has happened in particular. Both are stored and retrieved the same way; they differ in what they hold.

How Memory Works in Practice

The practical pattern for long-term memory has three moves: store important information (usually as embeddings), retrieve the relevant pieces when needed, and forget what is stale or no longer useful. Here is a minimal memory store built on the same tools as Chapter 36's RAG.

python

import numpy as np

memory = []     # each item: {"text": ..., "vector": ...}

def remember(text):                                  # STORE
    memory.append({"text": text, "vector": embed(text)})

def recall(query, k=3):                              # RETRIEVE
    q = embed(query)
    scored = [(cosine_similarity(q, m["vector"]), m["text"]) for m in memory]
    scored.sort(reverse=True)
    return [text for _, text in scored[:k]]          # most relevant memories

remember("The user prefers concise answers.")
remember("The user is working on a Python project about agents.")
print(recall("How should I respond to this user?"))  # surfaces the relevant memories

When the agent is about to respond, it calls recall to pull the relevant memories into its short-term context, so the model can reason with them. Forgetting can be as simple as dropping the oldest or least-used items so the store does not grow without bound.

Managing Short-Term Memory

Alongside a long-term store, you must actively manage the short-term context so it does not overflow. The strategies are the ones from Chapter 12: summarize older parts of the conversation into a compact note, trim material that is no longer relevant, and keep a running summary that preserves the gist while freeing space. A common pattern is to replace a long early conversation with a short summary the moment the context starts getting full — the desk stays usable while the essence is kept.

Memory Is Mostly Retrieval

Here is the unifying insight of this chapter and the next. An agent's long-term memory is, at heart, retrieval over its own stored information — the very RAG machinery of Chapter 36, with the storage handled by a vector database (Chapter 37). Semantic memory, episodic memory, recalling a user's preferences — all of it is store-as-embeddings and retrieve-by-similarity. This is why retrieval and vector databases sit at the core of Part VII: they are the foundation not just of answering from documents, but of an agent remembering anything at all.

Designing an Agent's Memory

Designing memory comes down to four practical decisions. What to remember (not everything — store what will plausibly be useful later). When to store it (at the end of a task, when a user states a preference, when something notable happens). How to retrieve it (by similarity to the current situation, often filtered by relevance). And when to forget (pruning the stale and the redundant). Thoughtful answers to these four questions are what separate an agent that remembers usefully from one that drowns in its own history.

Summary

Because the model is stateless and the context window is finite, agents need memory. Short-term memory is the working context in the window right now — fast but bounded and temporary, and it overflows on long tasks. Long-term memory stores information outside the window and retrieves it when relevant, surviving across sessions, and it is essentially RAG applied to the agent's own knowledge. Episodic memory, a kind of long-term memory, records specific experiences so the agent learns from its history, complementing semantic memory of general facts. In practice you store (as embeddings), retrieve (by similarity), and forget (prune the stale), while managing short-term memory by summarizing and trimming. Long-term memory is mostly retrieval — which is why vector databases sit at the core of agent building, and which leads directly to the next chapters.

Memory lets an agent remember; planning lets it tackle goals too big for a single step. Chapter 35 covers planning and task decomposition — how an agent breaks a large goal into achievable pieces.

Practice

Exercises

1Explain why an agent needs memory at all, referring to both the stateless model (Chapter 30) and the finite context window (Chapter 12).
2Add short-term conversation memory to a simple agent so it can refer back to earlier turns. Confirm it can answer a follow-up question that depends on something said earlier.
3Implement the `remember` and `recall` long-term memory functions from this chapter. Store several facts, then retrieve the ones relevant to a query and confirm the most relevant come back.
4Describe a task that specifically requires episodic memory — where the agent must remember a particular past event or interaction — and explain why semantic memory alone would not suffice.
5Explain the claim that 'long-term memory is mostly retrieval.' How does it connect agent memory to RAG (Chapter 36) and vector databases (Chapter 37)?
6For an agent of your choice, answer the four design questions: what it should remember, when it should store, how it should retrieve, and when it should forget. Note any privacy concerns.

View detailed solutions for all chapters →