World models are the new frontier

AIAcademy · AIAcademy · 2026-05-16

DeepMind — Genie 3, a new frontier for world models

DeepMind has been quietly running a different bet from everyone else. While OpenAI and Anthropic pushed text-and-tools agents up the time-horizon curve, Genie 3 (August 2025) turned a still image into a navigable, physics-consistent 3D world you can walk through for minutes at a time at 24 fps. Genie 4 extended that to hours of consistent simulation with agent embodiment. AlphaEvolve, in May 2026, showed the same architectural family generating novel mathematical conjectures and improving the matrix-multiplication algorithms used in their own training stack.

The thread connecting them is not "video generation." It is learned simulators — models that internalize the dynamics of an environment well enough to roll out plausible futures inside it. Text-and-tools agents call out to the world. World models contain a draft of the world they can search and plan against.

Why this is structurally different from the OpenAI bet. A text-agent loop is bounded by the latency and cost of every external tool call. A world-model loop is bounded by sampling inside a learned representation — orders of magnitude faster, fully differentiable, fits inside the optimizer. If the world model is accurate enough, planning becomes a search problem instead of an action problem. This is the path most robotics labs (Physical Intelligence, Figure, 1X, Tesla Optimus) are now betting on, and it is why "physical AI" is the phrase the field has converged on for 2027.

The honest caveat. Genie-class models hallucinate physics under distribution shift. They cannot yet simulate humans interacting at meeting-scale faithfully. The leap from "minutes of walkable 3D" to "hours of robotically-actionable plan" is the open research problem of the next two years.