Part 7 · The Core of AI Agents

Chapter 32The ReAct Pattern: Reasoning + Acting

⏱ 7 min read·✏️ 6 exercises·🖼 1 figure·The Core of AI Agents

In the last chapter we built the agent loop. Now we sharpen the reasoning-and-acting cycle into a specific, powerful pattern called **ReAct** — short for Reasoning + Acting — which was the breakthrough that first made tool-using agents genuinely reliable. The idea is simple to state and surprisingly deep: instead of thinking everything through up front or acting blindly, the agent alternates between thinking and acting, one step at a time. We will implement it from scratch and trace it in detail, so you understand not just how it works but why it works so well.

The Problem ReAct Solves

Early attempts at tool-using agents fell into two traps. Some had the model reason everything up front — make a complete plan — and then execute it blindly. This is brittle: the moment reality differs from the plan, the agent is lost, because it cannot adapt to what it actually finds. Others had the model act without reasoning, calling tools reflexively with no thought, which is just as bad — confident, fast, and frequently wrong. Neither worked reliably. ReAct fixes this by interleaving reasoning and acting, so each informs the other.

What ReAct Means

ReAct has the agent alternate between three things, in a repeating cycle: a Thought (reasoning about what to do next), an Action (calling a tool), and an Observation (the result of that tool). Thought leads to Action, Action produces Observation, Observation prompts the next Thought, and so on — until the agent reasons its way to a final answer. The model is essentially talking itself through the problem, taking one concrete action at a time and reflecting on each result before deciding the next move.

Figure 32.1 — The ReAct cycle: Thought (reason) leads to Action (use a tool), which yields an Observation (the result), which prompts the next Thought, repeating until a final answer.

A Worked Trace

Nothing makes ReAct clearer than watching it run. Suppose the goal is: "How many years passed between the founding of the city where the Eiffel Tower stands and the tower's completion?" Here is how a ReAct agent might think and act its way through it.

text

Goal: Years between the founding of the Eiffel Tower's city and the tower's completion.

Thought: The Eiffel Tower is in Paris. I need Paris's founding year.
Action: search("year Paris was founded")
Observation: Paris was founded around 250 BC.

Thought: Now I need the year the Eiffel Tower was completed.
Action: search("Eiffel Tower completion year")
Observation: The Eiffel Tower was completed in 1889.

Thought: From ~250 BC to 1889 AD is about 250 + 1889 = 2139 years.
Action: finish("About 2,139 years passed.")

Follow the rhythm: each Thought decides what is needed, each Action gathers it, each Observation informs the next Thought. The agent did not need to know everything in advance — it discovered the pieces one at a time and reasoned over them as they arrived. That adaptability is the whole point.

Why Interleaving Beats Planning-All-Upfront

The power of ReAct is that the agent reasons after each observation, so it adapts to what it actually finds rather than what it guessed it would find. If a search comes back empty, it can try a different query. If a result is surprising, it can change course. If it hits a dead end, it can back up. Planning everything up front assumes you already know what you will discover — but the entire reason to act is that you do not. By weaving thinking and acting together, ReAct lets the agent stay grounded in reality at every step instead of marching off a cliff because the plan said so.

ReAct in Code

Implementing ReAct is the Chapter 31 loop with a prompt that asks the model to produce a Thought and then either an Action or a final answer. You parse what it produces, run any requested tool, append the Observation, and repeat.

python

SYSTEM = (
    "Solve the goal step by step. Each step, write a Thought, then either "
    "an Action of the form  tool(args)  or a Final Answer. "
    "After an Action you will be shown the Observation."
)

def react(goal, tools, max_steps=10):
    history = [{"role": "system", "content": SYSTEM},
               {"role": "user", "content": goal}]

    for _ in range(max_steps):
        step = model_respond(history)          # the model writes a Thought + Action
        if step.has_final_answer:
            return step.final_answer
        observation = run_tool(step.action, step.args)     # ACT (your code runs it)
        history.append({"role": "assistant", "content": step.text})
        history.append({"role": "user", "content": f"Observation: {observation}"})

    return "Stopped: step limit reached."

This is deliberately close to the bare loop of Chapter 31 — because ReAct is that loop, with a prompt that elicits explicit reasoning before each action. The structure you already know is doing the work.

The Connection to Chain-of-Thought

If the "Thought" steps feel familiar, they should. ReAct is essentially chain-of-thought (Chapter 28) combined with tools (Chapter 29). Chain-of-thought taught the model to reason step by step before answering; tool calling let the model act in the world. ReAct marries them: the Thought is chain-of-thought reasoning, the Action is a tool call, and interleaving the two produces an agent that both thinks carefully and checks its thinking against reality. Two ideas you already know, joined into something more powerful than either alone.

Why ReAct Was a Breakthrough

Before ReAct, tool-using agents were unreliable curiosities. ReAct made them genuinely useful, and the reason is a kind of mutual grounding. The reasoning grounds the actions — the agent acts deliberately, for a stated reason, rather than reflexively. And the actions ground the reasoning — instead of imagining facts (and hallucinating), the agent checks reality by using tools and reasons over what it actually observes. Thinking keeps acting purposeful; acting keeps thinking honest. That two-way grounding is why ReAct works.

Limitations and Failure Modes

ReAct is powerful but not magic, and recognizing its failure modes will save you frustration.

Getting stuck in loops — an agent can repeat the same Thought and Action over and over, especially when a tool keeps returning unhelpful results. Step limits (Chapter 31) and loop detection guard against this.
Reasoning poorly — if the model reasons wrongly, it acts wrongly; bad Thoughts lead to bad Actions.
Running too long — complex goals can take many steps, raising cost and latency, so bounds and budgets matter.
Cascading errors — a wrong observation early can mislead every later step, so robust tools and validation (Chapter 33) are important.

Variations and Beyond

ReAct is the foundation, and many modern agent patterns build on or refine it — adding explicit planning (Chapter 35), reflection on past mistakes, or multiple agents working together (Chapter 41). The frameworks of Part VIII largely implement ReAct-style loops under the hood, so understanding ReAct deeply means understanding what those frameworks are really doing. Master this pattern and the rest of agent building becomes far less mysterious.

Summary

ReAct — Reasoning + Acting — solves the brittleness of planning everything up front and the blindness of acting without thought by interleaving the two. The agent cycles through Thought, Action, and Observation, reasoning after each result so it adapts to what it actually finds. In code it is simply the Chapter 31 loop with a prompt that elicits a Thought before each Action. ReAct is chain-of-thought combined with tool calling, and it works through mutual grounding: reasoning keeps actions purposeful while actions keep reasoning honest. Its failure modes — loops, poor reasoning, running too long — are guarded against with step limits and validation, and it is the foundation that nearly all modern agent patterns and frameworks build upon.

ReAct relies entirely on tools to act. Chapter 33 dives into the craft of those tools — how to design, build, and safely execute the capabilities that give an agent its hands.

Practice

Exercises

1Trace, by hand, the Thought–Action–Observation steps a ReAct agent would take to answer a multi-part question of your own (something requiring two or three lookups). Write out each step as in the worked example.
2Implement a simple ReAct loop that can use a single search tool (real or simulated). Confirm it interleaves reasoning and actions and stops at a final answer.
3Explain, in your own words, why interleaving reasoning with actions outperforms doing all the reasoning before any action. Give a concrete example where up-front planning would fail.
4Describe how ReAct combines chain-of-thought (Chapter 28) and tool calling (Chapter 29). What does each contribute to the pattern?
5Explain the idea of 'mutual grounding' — how reasoning grounds actions and actions ground reasoning. Why does this make ReAct more reliable?
6Describe one way a ReAct agent can get stuck in a loop, and two guardrails you would add to prevent it.

View detailed solutions for all chapters →