Join an ambitious project to build generative models of the 3D world. World models power numerous domains, such as creative applications, visual reasoning, simulation, planning for embodied agents, and real-time interactive experiences. The team is tightly integrated with Gemini, Genie, and Veo, and builds on those models while additionally exploring new, spatial modalities beyond images and videos.
Key responsibilities:
Conduct research to build generative multimodal models of the 3D world. Solve essential problems to train world models at massive scale, develop metrics for spatial intelligence, curate and annotate training data, enable real-time interactive experiences, explore downstream applications, and study integration of spatial modalities with multimodal language models. Build and maintain large model systems and infrastructure to support research exploration. Embrace the bitter lesson and seek simple, effective methods that scale.
Areas of focus:
We seek individuals who are passionate about the intersection of large-scale generative models and spatial or 3D signals, and who believe learning that large-scale spatial information is a necessary part of the path to intelligence. We strive for simple methods that scale and look for candidates excited to improve models through infrastructure, data, evals, and compute.
In order to set you up for success as a Research Scientist/Engineer at Google DeepMind, we look for the following skills and experience:
In addition, the following would be an advantage: