Part 3 · 5 chapters

Inside Large Language Models

Now we open the engine that powers every agent. You will learn how transformers work, how text becomes tokens, what attention does, and how these models are pretrained — explained simply, with no hand-waving.

Chapter 10Inside Large Language Models

The Transformer Architecture, Explained Simply

The transformer is the single most important invention behind modern AI. Every model you have heard of is built on it. Yet its core idea is surprisingly graspable, and you do not need any equations to truly understand it. In this chapter we take the transformer apart piece by piece, using pictures and analogies, until the word *attention* stops being jargon and starts being obvious. By the end you will understand not just how a transformer works, but *why* this particular design unlocked everything that followed.

⏱ 10 min✏️ 6 exercises🖼 Figures

→

Chapter 11Inside Large Language Models

Tokenization: How Text Becomes Tokens

In Chapter 10 we said a model first splits text into *tokens*, then waved our hands and moved on. It is time to lift the lid on that very first step, because it is far more consequential than it appears. Tokenization quietly determines how much you pay, how much text a model can handle at once, and why models sometimes fail at tasks that look trivially easy to a human. This is a longer chapter than its humble subject suggests, and deliberately so: understanding tokenization will save you money, prevent confusion, and demystify a whole category of strange model behavior. We take it slowly, with plenty of examples, and assume no prior knowledge.

⏱ 10 min✏️ 6 exercises🖼 Figures

→

Chapter 12Inside Large Language Models

Attention and the Context Window

We met attention in Chapter 10 as the idea that every word looks at every other word. Here we deepen that picture and then follow it to one of its most important practical consequences: the **context window**, the fixed amount of text a model can hold in mind at once. The context window shapes how much a model can read, how much it costs, what it forgets, and why techniques like retrieval exist at all. For anyone building agents — which accumulate long histories of steps and observations — managing this limit well is not a side detail; it is a central skill. We take our time, with analogies and concrete strategies throughout.

⏱ 9 min✏️ 6 exercises🖼 Figures

→

Chapter 13Inside Large Language Models

How LLMs Are Pretrained

We have built up, piece by piece, all the machinery of learning: neural networks, loss, gradients, the training loop. Now we put it to work at a scale that is genuinely hard to imagine, and watch how it produces a large language model. The astonishing part of this chapter is how *simple* the core idea is. An LLM learns from one deceptively humble task — predicting the next token — repeated over a quantity of text no human could read in a thousand lifetimes. Understanding pretraining clarifies the whole pipeline you set out to learn: where a model's knowledge comes from, why it sometimes invents things, and what still has to happen before a raw model becomes the helpful assistant you talk to.

⏱ 9 min✏️ 6 exercises🖼 Figures

→

Chapter 14Inside Large Language Models

Open vs. Closed Models and the Modern Landscape

You now understand how language models are built. This final chapter of Part III turns practical: with so many models available, how do you choose one for your own project? We will map the landscape — the big split between models you access over the internet and models you run yourself — and lay out a clear framework for deciding. One warning up front, and it matters: the specific models, their names, and their prices change every few months, so this chapter deliberately avoids naming today's leaders. Instead we teach the *durable distinctions and questions* that will still guide you long after this year's leaderboard is forgotten.

⏱ 8 min✏️ 6 exercises

→

← Part 2 Part 4 →