Running Inference: Local and in the Cloud
We have spent five parts understanding how models are built. Now the book pivots to the half you will spend most of your time in: *using* them. This part is about putting a finished model to work, and it begins with the most basic act of all — running the model to get an answer, which is called inference. We will see the two places you can run a model, what actually happens inside when it generates text, the settings that shape its output, and how to think about speed and cost. Everything here is practical and beginner-friendly, and it is the ground floor for building agents.
