The Physical Stack of AI · Fab and packaging supply chain

HBM — the binding memory constraint

You can explain what High Bandwidth Memory is, why memory bandwidth gates AI accelerator performance, and who the three HBM suppliers are.

A GPU's compute units can do tens of petaflops of matrix math. The bottleneck is feeding them. If the memory cannot hand new numbers to the cores fast enough, the cores sit idle and the published FLOPS figure becomes a marketing number rather than a working one. For large-language-model inference and training in 2026, memory bandwidth is what the chips are starving for.

High Bandwidth Memory (HBM) is the fix. Instead of laying memory chips flat next to the GPU on a circuit board, HBM stacks DRAM die vertically — often 12 or 16 layers high — and wires them directly to the GPU through a silicon interposer. The result is a bus measured in terabytes per second instead of gigabytes. NVIDIA's Rubin GPU lists 288 GB of HBM4 and up to 22 TB/s of bandwidth. AMD's MI400 preview lists up to 432 GB of HBM4 at 19.6 TB/s. Google's Ironwood TPU raises per-chip HBM capacity to 192 GB and bandwidth to 7.37 TB/s. Without the HBM stack, the accelerator cannot perform — period.

Chapter contains 3 lessons.