As I’ve watched a friend play Skyrim over the last few days, I’ve been blown away by the immense world that Bethesda has created. Not only is it huge, but it’s rather beautiful as well. “Slap on an HMD,” I thought, “and this would be a wonderfully immersive virtual reality world.” But as I continued to watch my friend play, I noticed how the interactions between the player and the non-player characters seemed to lag years behind the graphics. They were stale and scripted, unlike the sandbox world that contained them.

Much of the dialogue in the game has a non-player character (NPC) talking to the player about the world. The player gets to interact by selecting from a list of canned responses. Sometimes, the questions you want to ask or the things you want to say just aren’t in that list, and this truly detracts from the immersion. You don’t feel like you are in control because you’re limited to just a few choices. Your character doesn’t even speak the lines aloud; you just pick an option and the NPC starts responding. It also makes the NPCs feel less real because they seem like they are just a robot reading a script (probably because that’s what they are).

A world like Skyrim would be far closer to immersive virtual reality if the NPCs were able to not only hear your voice (through a microphone) but understand it and respond appropriately. Not only would you be able to ask the questions you really wanted to, but you’d have to be more immersed in the lore of the game to even know what to ask.

Voice recognition difficulty aside, having an NPC respond naturally to nearly unpredictable input is definitely a huge challenge, but it’s certainly possible. Games will become far more real when we achieve a level of AI where this is possible, and we aren’t as far away as you might think.

If I say “virtual reality” people often picture someone hooked up to a head mounted display. Head tracking, 3D life-like graphics, surround sound — all of this comes to mind when thinking about that image. Artificial intelligence is one piece of the puzzle that doesn’t immediately come to mind, but it is just important to creating an immersive experience as all of those other components.

The use of the term “AI” it today’s world is quite misleading. I’d argue that there is still a lot of work to be done in achieving artificial intelligence, but we’re on our way.

Alan Turning, in 1950, developed something now referred to as the ‘Turning Test’ which seeks to test the ability of computer intelligence and determine whether or not it can be considered true artificial intelligence. The basis of the test is to have one human act as a judge, then communicate with another human and a computer through typed communication. The judge is separated from the computer and the other human, and doesn’t know which is which when communication is received. According to the turning test, the computer system can be considered artificial intelligence if the judge is unable to realiably determine which of the responses are from the computer and which are from the other human.

We simply aren’t there, yet. A lot of time and research is going into creating a true AI system.

Funded by the Defense Advanced Research Projects Agency (DARPA), the CALO project ran from 2003-2008. CALO (Cognitive Assistant that Learns and Organize) sought to create a machine entity that that would be useful to human users. A spin-off from the CALO project is the now famous Siri.

Undoubtedly, Siri is closest system to true AI to be made available to the mainstream. Siri is a sort of digital assistant that was first released with the iPhone 4S and can do simple tasks like send text messages, emails, check your calendar, tell you the weather, etc. While other devices have had voice-command features for years prior, Siri makes strides in understanding natural human speech with context.

While no one could mistake Siri’s output for a human because of the computerized nature of the voice, with a proper voice some might be fooled. Siri is a big stride toward a Turning-capable system thanks to its ability to accept a broad range of contextual and natural human input. As similar systems are developed and deployed, they will undoubtedly find their way into our games and fill that vital missing link of immersive virtual reality.

As of March 2011, Microsoft’s Kinect has sold 10 million units. Every Kinect has a microphone inside, and the Xbox 360 easily has the processing power to handle a system like Siri. Although less widespread, the Playstation Eye, a camera which connects to the PS3, also has a microphone. Microsoft is bringing basic voice control to the Xbox 360 through the Kinect this holiday.

If Microsoft and Sony know what they’re doing, the next generation of consoles will feature some sort of pseudo-AI, and this will open up the door for integration of that AI into games.

Newsletter graphic

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. More information.


Ben is the world's most senior professional analyst solely dedicated to the XR industry, having founded Road to VR in 2011—a year before the Oculus Kickstarter sparked a resurgence that led to the modern XR landscape. He has authored more than 3,000 articles chronicling the evolution of the XR industry over more than a decade. With that unique perspective, Ben has been consistently recognized as one of the most influential voices in XR, giving keynotes and joining panel and podcast discussions at key industry events. He is a self-described "journalist and analyst, not evangelist."
  • Tim

    After watching/playing Skyrim for a few hours I noticed the lack of fluent dialogue as well. For a game that is so developed in graphics, gameplay dynamics, and overall playability, it seems ridiculous that the NPC dialogue would be so overly simplistic to the point of detracting from its overall feel. The dialogue stands against everything that TES has strived to be: immersive.

    Have you considered whether this was intentionally done? (This may be a bit off topic, sorry) Sadly, the trend in many forms of widespread consumerism is a watered-down product that can appeal to the largest possible portion of the population. A large share of the video game community (for lack of a better term) is composed of the subadult age class. It may be true that Bethesda had the opportunity to implement complex dialogue yet decided against it in fear of ostracizing a large portion of their fan base. In short, possible profits may have outweighed more realistic dialogue.

    I see this “appealing to the masses” as one of the largest obstacles for Virtual Reality.

    I also loved the Siri part of your post. Implementing something similar in an RPG or MMORPG would take the level of gameplay to new heights. The concept of taking in information from a player, understanding it, and then using it in dialogue/interaction with a separate player would add huge amounts of complexity and reality to an MMORPG. Virtual worlds that are dynamic and dependent on the participants would be possible.