Frontiers · World models and embodied AI

Genie 3 to Genie 4

You can explain why the Genie line — Genie 2, Genie 3, and the rumored Genie 4 — is the single clearest example of a generative interactive world model, and what each version added.

Genie 2 (December 2024) was the first time DeepMind framed an interactive video model as a world model in the strict sense. You gave it a still image. It gave you back a small, playable environment — a few seconds of coherent navigation conditioned on your keyboard input. The output was a response, not a clip.

Genie 3 (August 2025) extended the coherent-play horizon to several minutes, sharpened object permanence, and added physical interaction — pushing objects, opening doors, modifying the scene in ways the model preserved. Project Genie went GA on Google AI Ultra in January 2026 at 24 fps and 720p.

Genie 4, signaled for late 2026 or 2027 at Google I/O context-setting, is being framed less as a consumer product and more as training infrastructure. The Genie + Sima training loop — Genie generates worlds, Sima learns to act in them — is reportedly load-bearing in Gemini 4's embodied-agent training. That moves world models from "look what we can render" to "this is how we get training data for systems that have to operate in physical environments."

This chapter walks the four-lesson arc: what Genie is, what shipped, what's signaled, and how the training loop changes the curriculum for agents.

Type: multi-choice

Chapter contains 4 lessons.