Part 2 · Machine Learning Essentials

Chapter 9Embeddings: Turning Meaning into Numbers

⏱ 7 min read·✏️ 5 exercises·🖼 1 figure·Machine Learning Essentials

We have reached one of the most quietly important ideas in all of modern AI. Computers work only with numbers, yet we want them to work with the meaning of words. **Embeddings** are the bridge: they turn words, sentences, and whole documents into lists of numbers arranged so that *meaning becomes geometry*. This single idea powers search, memory, and the retrieval that grounds agents — including the RAG system you already built in Chapter 36. By the end of this chapter, the magic behind that system will be fully demystified.

The Core Problem: Computers Don't Understand Words

A computer cannot do anything directly with the word "cat." It manipulates numbers, and only numbers. So if we want a machine to compare meanings, search by topic, or remember what was said, we first need a way to represent words as numbers. The challenge is not just turning words into any numbers, but into numbers that capture meaning — so that related words end up with related numbers.

A Naive First Try (and Why It Fails)

The obvious idea is to assign each word an ID: cat = 1, dog = 2, car = 3, and so on. It is simple, but it fails badly. These numbers carry no meaning. The fact that cat (1) and dog (2) sit next to each other is a meaningless accident of alphabetical order, while cat (1) and car (3) look further apart even though, as words, none of this reflects how related they actually are. The numbers tell the machine nothing about meaning, which is the only thing we care about.

We need representations where the distance between the numbers reflects the distance in meaning. That is precisely what embeddings deliver.

The Big Idea: Meaning as Position

An embedding represents each word as a vector — a list of numbers, exactly the kind you met in Chapter 5 — positioned in space so that words with similar meanings sit close together. "Cat" and "kitten" land near each other; "cat" and "car" land far apart. Meaning becomes location, and similarity becomes distance.

How could a machine ever learn such an arrangement? Through a beautifully simple principle: a word is known by the company it keeps. Words that appear in similar contexts tend to have similar meanings. "Cat" and "kitten" show up around words like purr, fur, and pet, so a model that learns from oceans of text places them close together. The embedding is learned, not hand-designed — and it learns meaning purely from how words are used.

Figure 9.1 — Words plotted by their embeddings: animals cluster in one region, colors in another, vehicles in a third. Closeness reflects similarity of meaning.

The Famous Example: Word Arithmetic

Embeddings have a property so striking it feels like a trick: you can do arithmetic with meaning. The classic example is that the embedding of "king," minus "man," plus "woman," lands very close to the embedding of "queen." The space has organized itself so that directions encode relationships.

There is a consistent "male to female" direction, and another for "singular to plural," and another for verb tense. Moving along the gender direction from "king" takes you to "queen," just as it takes you from "actor" to "actress." Nobody programmed these directions; they emerged from learning how words are used. This is the clearest possible sign that embeddings capture genuine structure in meaning, not arbitrary numbers.

Where Embeddings Come From

Embeddings are learned by neural networks using exactly the training process from Chapter 8. A model is given the task of predicting words from their surrounding context, and to do that well it is forced to discover useful representations of meaning — which become the embeddings. In fact, the language models at the heart of this book learn rich embeddings as a natural by-product of learning to predict text.

In practice you rarely train your own. You call an embedding model — through an API or a local library — and it hands you the vector for any text you give it. That is the embed() helper you used in the RAG chapter.

From Words to Sentences and Documents

Early embeddings represented single words. Modern embedding models go much further: they embed whole sentences, paragraphs, or documents into a single vector that captures the overall meaning of the passage. This is what makes retrieval possible — you can embed an entire chunk of a document, and an entire question, and compare them directly. The leap from word embeddings to passage embeddings is what turned this idea from a research curiosity into the engine of practical search and RAG.

Embeddings in Code

Let us make it concrete with the same tools from Chapter 36. We embed a few words and measure their similarity, and watch meaning show up as numbers.

python

import numpy as np

def cosine_similarity(a, b):
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

words = ["cat", "kitten", "car"]
vectors = [embed(w) for w in words]     # embed() returns each word's vector

print(cosine_similarity(vectors[0], vectors[1]))   # cat vs kitten -> high
print(cosine_similarity(vectors[0], vectors[2]))   # cat vs car    -> low

The first comparison scores high because cat and kitten are close in meaning; the second scores low because cat and car, despite sharing letters, are far apart in meaning. The numbers have learned what the letters never could.

Why Embeddings Power Everything

Once meaning is a position in space, a whole world of capabilities opens up, and you will use all of them when building agents.

Semantic search — find documents by meaning rather than exact keywords, so a search for "car" can surface a page about "automobiles."
Retrieval-Augmented Generation — embed your documents and your question, then retrieve the closest chunks, exactly as in Chapter 36.
Memory — store an agent's past experiences as embeddings and recall the relevant ones by meaning when a similar situation arises.
Clustering and organization — group similar items automatically, the unsupervised learning from Chapter 6 in action.

Summary

Embeddings turn text into vectors arranged so that similar meanings sit close together, solving the core problem that computers understand only numbers. Naive ID numbers carry no meaning; embeddings carry it by placing words by the company they keep, learned from vast text through the training process of Chapter 8. The result is a space where distance means dissimilarity and even directions encode relationships, as the king-queen example shows. Modern embedding models handle whole passages, not just words, which is what makes semantic search, RAG, and agent memory possible — though they also inherit the biases of their training data.

This completes Part II. You now understand how machines learn, what neural networks are, how they are trained, and how they represent meaning. In Part III we turn all of this toward the specific kind of model that powers agents: the large language model, beginning with the transformer architecture in Chapter 10.

Practice

Exercises

1Explain, in your own words, why assigning each word a simple ID number (cat = 1, dog = 2) fails to capture meaning. What property do embeddings have that ID numbers lack?
2Generate embeddings for ten words of your choosing (mixing a few categories like animals, colors, and vehicles), then compute the cosine similarity for several pairs. Which pairs score highest, and does it match your intuition?
3Explain why "king − man + woman ≈ queen" works, referring to the idea that directions in embedding space encode relationships. Suggest another word relationship that might have its own consistent direction.
4Build a tiny semantic search: embed five short sentences, then embed a query and return the sentence with the highest cosine similarity. Confirm it finds the most relevant sentence even when it shares no exact words with the query.
5The chapter notes that embeddings inherit biases from their training data. Describe one concrete way this could cause a problem in a real application, and why awareness of it matters for someone building agents.

View detailed solutions for all chapters →