The Physical Stack of AI · Chips and accelerators

GPUs as the dominant accelerator

You can explain at a kitchen-table level why a GPU is the right shape of chip for AI, and read the four numbers that actually determine how good one is.

A modern frontier model is, mathematically, a very large pile of matrix multiplications. To train and run one fast, you don't want a clever chip — you want a chip that does the same simple multiply-and-add operation in parallel, billions of times per second. That is exactly what a GPU is built to do.

The CPU in your laptop is built the opposite way: a small number of very smart cores, each great at handling unpredictable, branching work like running an operating system. A GPU has thousands of much simpler cores arranged for parallel arithmetic. When Nvidia's Vera Rubin platform packs 50 PFLOPS of FP4 compute and 288 GB of HBM4 memory onto one accelerator, the entire design is in service of feeding those parallel units a steady diet of matrix math.

This chapter zooms in on four ideas — what a GPU is, what "FLOPS" really mean, why memory bandwidth has overtaken raw FLOPS as the binding constraint, and why a single chip is no longer the unit anybody buys in. By the end you'll be able to read an NVIDIA, AMD, or Google spec sheet and know which numbers matter.

Type: multi-choice

Prompt: > Why is a GPU, not a CPU, the right chip for training a frontier AI model?

Chapter contains 4 lessons.