The Physical Stack of AI · Chips and accelerators

Specialist inference chips

You can name three specialist inference chip makers in 2026 (Cerebras, Groq, Etched), explain the architectural bet each is making, and place them against general-purpose GPUs.

There is a category of accelerator that doesn't try to be a GPU at all. These are specialist inference chips, built on the bet that a chip purpose-tuned for one shape of workload can beat a general-purpose GPU on cost, speed, or both — even though general-purpose GPUs have the better software ecosystem and a much larger ecosystem of supporting tools.

Three companies illustrate three different bets. Cerebras builds a chip the size of a dinner plate — a single wafer-scale processor — to remove the cost of chip-to-chip interconnect. The company IPO'd in May 2026 at $5.55 billion, after a $20 billion contract with OpenAI for inference capacity. Groq builds a deterministic chip with no HBM at all, betting that for output-token speed (latency per token, not throughput per dollar) you want SRAM and a different memory architecture entirely; Nvidia bought Groq for $20 billion in December 2025. Etched bakes the transformer architecture itself into hardcoded silicon — no flexibility, just enormous speed on one model family.

Chapter contains 3 lessons.