Google’s Project Genie Makes Real-time Explorable Virtual Worlds, Offering a Peek Into VR’s Future

15

DeepMind, Google’s AI research lab, announced Genie 3 last August, showing off an AI system capable of generating interactive virtual environments in real-time. Now, Google has released an experimental prototype that Google AI subscribers can try today. Granted, you can’t generate VR world on the fly just yet, but we’re getting tantalizingly close.

The News

Project Genie is what Google calls it an “experimental research prototype,” so it isn’t exactly the ‘AI game machine’ of your dreams just yet. Essentially, it allows users to create, explore, and modify interactive virtual environments through a web interface.

The system is a lot like previous image and video generators, which require inputting a text prompt and/or uploading reference images, although Project Genie takes this a few steps further.

Instead of one, Project Genie has two main prompt boxes—one for the environment and one for the character. A third prompt box also allows you to modify the initial look before fully generating the environment (e.g.. make the sword bigger, change the trees to fall time).

As an early research system, Project Genie has limitations, Google says in a blog post.  Generated environments may not closely match real-world physics or prompts, character control can be inconsistent, sessions are limited to 60 seconds, and some previously announced features are not yet included.

And for now, the only thing you can output is a video of the experience, although you can explore and remix other ‘worlds’ available in the gallery.

Project Genie is now rolling out to Google AI Ultra subscribers in the US, aged 18 and over, with broader availability planned to release at some point in the future. You can find out more here.

SEE ALSO
Apple Design Lead Heads to Meta, Hopefully to Fix Longstanding Quest UX Issues

My Take

There are a lot of hurdles to get over before we can see anything like Project Genie running on a VR headset.

One of the most important hurdles to get over is undoubtedly cloud streaming. Frankly, cloud gaming exists on VR headsets, but it’s not great right now since latency is so variable based on how close you are to your service’s data center. That, and the big names in cloud gaming today (i.e. NVIDIA GeForce Now, Xbox Cloud Gaming) are generally geared towards flatscreen games; when it comes to render and input latency, the bar is much lower than VR headsets, which generally require a maximum of 20ms motion-to-photon latency to avoid user discomfort.

And that’s also not taking into account that Project Genie would need to also somehow render the world with stereoscopy in mind—which may present its own problems since the system would technically need two distinct points of view that resolve into a single, solid 3D picture.

As far as I understand, world models created in Project Genie are probabilistic, i.e. objects can behave slightly different each time, which is part of the reason Genie 3 can only support a maximum of few minutes of continuous interaction at a time. Genie 3 world generation has a tendency to drift from prompts, which probably gives undesired results.

So while it’s unlikely we’ll see a VR version of this in the very near future, I’m excited to see the baby steps leading to where it could eventually go. The thought of being able to casually order up a world on the fly Holodeck-style that I can explore—be it past, present, or any fiction of my chooseing—feels so much more interesting to me from a learning perspective. One of my most-used VR apps to date is Google Earth VR, and I can only imagine a more detailed and vibrant version of that to help me learn foreign languages, time travel, and tour the world virtually.

Before we even get that far though, there’s a distinct possibility that the Internet will be overrun by ‘game slop’, which feels like asset flipping taken to the extreme. It will also likely expose game developers to the same struggles that other digital artists are facing right now when it comes to AI sampling and recreating copyrighted works—albeit on a whole new level (GTA VI anyone?).

That, and I can’t shake the feeling that the future is shaping up be a very odd, but hopefully also a very interesting and not entirely terrible place. I can imagine a future wherein photorealistic, AI-driven environments go hand-in-hand with brain-computer interfaces (BCI)—two subjects Valve has been researching for years—and serving up The Virtual Reality I’m actually waiting for.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. See here for more information.

Well before the first modern XR products hit the market, Scott recognized the potential of the technology and set out to understand and document its growth. He has been professionally reporting on the space for nearly a decade as Editor at Road to VR, authoring more than 4,000 articles on the topic. Scott brings that seasoned insight to his reporting from major industry events across the globe.
  • Looks very interesting… still early stages, but interesting

  • Stephen Bard

    "Marble" already generates pretty good virtual worlds that you can walk around inside, but it doesn't have the animation.

    • Nevets

      I think this surpasses Marble by a substantial margin. Marble is 'just' a generated splat. Genie looks like a living world.

      • Stephen Bard

        From the article, it sounds like Project Genie would be difficult to to transfer into actual 3D/VR.

        • Christian Schildwaechter

          The sole company I know of working on AI that can create virtual worlds directly from "conventional" 3D objects made of textured polygons is Meta, due to their metaverse focus. I assume that others are at least prototyping this too, as it should be way more efficient for local rendering than something like Project Genie first creating just video that then has to be somehow converted to separate 3D objects. We got image and video generation first, because it is easier to realize/train with a DNN/LLM, and has a wider set of possible application.

          But i suspect this will turn out to be more of a dead end for creating virtual worlds one can roam around in, and more specialized AI directly targeting the 3D render capabilities of GPUs, paired with separate AI driven post-processing for upscaling, photorealism and extra frame generation will ultimately dominate VR/AR world creation.

      • Octogod

        "Just"

        While one is more interactive, the other is a real product that you can use now in VR.

        I believe most people in this space are still more interested in a hypothetical future than a shippable reality that will occur this decade.

  • Rob McCormick

    Google has a VERY VERY long list of abandoned products. So much for their gaming platform is just the latest.
    Why anyone would invest in a Google branded platform is amazing. They just abandon you after they don't make enough money and move on,leaving you holding the bag.

    • Christian Schildwaechter

      Google only abandons products that don't make (enough) money, so as long as you support a profitable Google platform, you should be fine. The sole problem is that pretty much the only things that make them money are ads and some cloud services. So the search engine and Chrome and YouTube are pretty safe due to continued ad income. Even Android is, as the full version forces phone manufacturers to include Chrome (and other Google services), so Google doesn't have to pay billions for Firefox and Safari setting Google as their default search engine. For everything else, only time will tell.

      But that's not really different from Meta, who also make more than 98% of their money from ads on Facebook and Instagram. And who also have cancelled multi-billion dollar projects like internet_org, which was supposed to being free mobile data for basic internet services to developing countries. And was canned after these countries stopped Meta from defining "basic internet services" as meaning mostly Wikipedia plus Meta services generating ad revenue. And are now massively reducing their VR investmenta.

  • MadHenGSH

    pAIpe dream.

  • ai is cringe

  • Patrick Hogenboom

    I'm wondering if this probabilistic process can even generate a pair of matching stereo images

    • Christian Schildwaechter

      Given that AVP and soon GXR can already turn regular photos into stereo images with AI, and that 3D movies pretty much abandoned using dual cameras more than a decade ago and instead just add very realistic stereoscopy in post processing, that's not really going to be a problem.

      The probabilistic process should be able to do so by itself just with a prompt of "show the same with the view moved 65mm to the right", but it will be way more efficient to just create the second image with a separate AI trained just for creating stereoscopic images instead of whole worlds.

      • Patrick Hogenboom

        As persistence is not the strong suit of gen-AI, I'm having doubts whether so such a process could produce a stable stereoscopic video feed which will not nauseate the VR user.

        Besides, I'm a stereoscopic affectionado and a good stereoscopic image/video contains stereoscopic cues in each tiny detail. I have tried stereoscopic gen-AI and the results are always meh, it just can't procude reality-level of detail which photography does.

  • Christian Schildwaechter

    TL;DR: Looks nice, but might be a very inefficient approach for most use cases that are (way) easier to solve with existing tools.

    I'm sure we'll get some very interesting tools for creating environments based on this. But the results reminds me of the early days of esp. Google Cardboard (and pre-Cardboard Durovis Dive), where lots of people released an extreme form of asset flip. They took the demo scenes from Unity store low poly asset bundles, dropped in the default Cardboard character controller with stereoscopic rendering, 3DoF head tracking and locomotion via gamepad, and released it as a VR app. So you could walk and look around, and nothing else.

    That was at least initially interesting, for the first time being able to explore these often pretty scenes in VR, but after a short while/a few of these "experiences", it got rather boring, simply because there was nothing to do. It's like being dropped into an open world game without any quests or NPCs or things to collect or anything interactable, all you can do is roam around.

    That may be fine for many as long as it is Skyrim VR with all the mods, and I've spent a lot of time in AC Origins just strolling around. But Genie probably won't generate a beautiful, consistent, large scale world worth exploring, only a small part of it based on a few prompts. So if you walk for some time, a new scene has to be generated. And you may have to add new prompts instead of just exploring in awe, so it is more like a sandbox where you can create anything, but also have to at least guide the creation to get anything.

    We've had game development tools for creating complex worlds or buildings according to simple sets of rules plus some randomization for many years. These often include the option to also scatter things like collectable loot already tagged with certain properties like in many dungeon generators. And these create worlds from reusable objects that are way more render friendly esp. on mobile. It's not quite the same, and "AI does everything" will be more flexible, but there are most likely way better ways to create interesting worlds worth exploring than having an AI (or more accurately, machine learning deep networks without actual intelligence) create it all.

    There is a series of YouTube videos by SOUNDTRACK "reimagining" older video games with the Runway AI. youtube_com/playlist?list=PLHB5EAWcOdCny1vjj2PXRSP9M9lZCAfXK The images below are from GTA San Andreas, done in late 2024, so regarding the AI capabilities already very "outdated". youtu_be/FJ-uJdknNVc We may never see a GTA SA Quest port. But if AI video "beautify" filters improve enough to run in real time on the NPUs or DSPs in current or near future mobile SoCs (much, much easier to achieve than something like Google's Genie), you may then be able to play GTA San Andreas using Holydh's VR mod with photorealistic graphics on a Frame 1 or 2. Without needing any AI data centers or cloud streaming.

    Or you could make your own instead with a live world creation tool, spitting out a render friendly low poly landscape or city, and then have the AI turn it into something looking like a PCVR game driven by a USD 2K+ GPU. AI will probably end in these world creation tools too, so you could still add a sword object wherever you need it by just asking for it. And while you are there, possibly also a couple of things/tasks/quests to actually do in that world .

    https://uploads.disquscdn.com/images/6fcadc5e6e7df42df03c9ec2a657f8d096c763e8e81a8f083cbcc6ec995a09a0.jpg

  • Oxi

    Like a lot of AI stuff, this is contradictory. The point of "exploring" is to find something. In games it's to find things that people crafted. Imagine Breath of the Wild if it was just AI generated, it would be the ideal way to entirely waste hundreds of hours of your life on literally nothing at all.