The 3 Biggest Challenges Facing Augmented Reality Today

With the hype surrounding Apple and Google's (admittedly very cool) AR tracking tech now in the hands of tens of millions of developers and users, you might be tempted to think that immersive augmented reality experiences—delivering the promise of wild AR concept videos we've seen over the last decade—are just around the corner. While we are closer than ever, the truth is there's years of R&D and design work still standing between us and immersive AR for the mainstream. Here's an overview of some of the key challenges being worked on today. Immersive Field of View It's easy to watch cool ARKit videos and imagine that the full screen view you're seeing on your computer monitor would take up your entire natural field of view. The reality is that even today's best portable AR headset dev kits still have very limited fields of view (far from the field of view of today's VR headsets, which some still feel is lacking!). HoloLens—in many ways the best AR headset a developer can buy today—has a tiny ~34 degree diagonal field of view, far less even than Google Cardboard (at ~60 degrees diagonal). A video from our pal Oliver Kreylos compares a full field of view to a ~34 degree field, and the result is that you're only seeing a sliver of the augmented reality world at any one moment: https://www.youtube.com/watch?v=syFRdNs68s4 And that's important, because in order to achieve reasonable levels of immersion, the augmented world needs to seamlessly blend with the real world. Without being able to see the bulk of the augmented reality world at once, you'll find yourself 'scanning' unnaturally with your head—as if looking through a periscope—to find out where AR objects actually are around you, instead of allowing your brain's intuitive mapping sense to map the AR world as part of the real world. [caption id="attachment_53511" align="alignright" width="325"] Image courtesy Microsoft[/caption] That's not to say that an AR headset with a 34 degree field of view can't be useful, it's just that it isn't immersive, and therefore doesn't deeply engage your natural perception, which means it isn't well suited for the sort of intuitive human-computer-interaction that's ideal for consumers and entertainment purposes. "But Ben," I hear some of you say, "What about Meta 2 AR headset and its 90 degree field of view?" Good question. Yes, Meta 2 has the widest field of view of an AR headset that we've seen yet—and one that approaches that of today's VR headsets—but it's also bulky, with no obvious path to shrinking the optical system without sacrificing a significant portion of the field of view. Meta 2's optics are actually quite simple. The big 'hat brim' portion of the headset contains a smartphone-like display that's facing the ground. The large plastic visor is partially silvered on the inside and reflects what's on the display into the user's eyes. Shrinking the headset would mean shrinking both the display and the visor, which would naturally result in a reduced field of view. Meta 2 might be great for developers—who are willing to put up with a bulky (and still tethered) headset for the sake of developing for future devices—but it's going to take a different optical approach to hit that field of view in a consumer form-factor. On that front, ODG is working with a similar but shrunken optical system, and landed at a 50 degree field of view on their top of the line, $1,800 R-9 AR glasses—and they're only still just barely approaching a consumer-acceptable size. Taking a different optical approach (waveguide), Lumus has managed to squeeze a 55 degree field of view out of 2mm thick optics. But these are the very limits of AR's field of view right now in form-factors that are close to properly portable. [irp posts="74930" name="The Difference Between Smartglasses & AR Glasses, and Why Everyone is Confused"] A ~50 degree field of view isn't half bad, but it's still a far cry from today's leading VR headsets which have around a 110 degree field of view—and even then consumers are still demanding more. It's hard to put a single figure on what makes for a truly immersive field of view, however Oculus has in the past argued that you need at least 90 degrees to experience true Presence, and, at least anecdotally, the VR industry at large seems to concur. Continued on Page 2: Real-time Object Classification » Real-time Object Classification [caption id="attachment_28425" align="aligncenter" width="640"] Image courtesy Google[/caption] Apple's ARKit and Google's ARCore tech will let you do some pretty nifty and novel AR-like things on your smartphone, but for the most part these systems are limited to understanding flat surfaces like floors and walls. Which is why 99% of the AR apps and demos out there right now for iOS take place on a floor or table. Why floors and walls? Because they're easy to classify. The plane of one floor or wall is identical to the plane of another, and can be reliably assumed to continue as such in all directions until intersecting with another plane. Note that I used the word understand rather than sense or detect. That's because, while these systems may be able to 'see' the shapes of objects other than floors and walls, they cannot presently understand them. Take a cup for instance. When you look at a cup, you see far more than just a shape. You already know a lot about the cup. How much? Let's review: You know that a cup is an object that's distinct from the surface upon which it sits You know that, without actually looking into the top of that cup, that it contains an open volume which can be used to hold liquids and other objects You know that the open volume inside the cup doesn't protrude beyond the surface upon which it sits You know that people drink from cups You know that a cup is lightweight and can be easily knocked over, resulting in a spill of whatever is inside I could go on... The point is that a computer knows none of this. It just sees a shape, not a cup. Without getting a full view sweep of the interior of a cup, to build a map of the shape in its entirety, the computer can't even assume that there's an open interior volume. Nor does it know that it is a separate object from the surface it sits upon. But you know that, because it's a cup. But it's a non-trivial problem to get computer-vision to understand 'cup' rather than just seeing a shape. This is why for years and years we've seen AR demos where people attach fiducial markers to objects in order to facilitate more nuanced tracking and interactions. Why is it so hard? The first challenge here is classification. Cups come in thousands of shapes, sizes, colors, and textures. Some cups have special properties and are made for special purposes (like beakers), which means they are used for entirely different things in very different places and contexts. [irp] Think about the challenge of writing an algorithm which could help a computer understand all of these concepts, just to be able to know a cup when it sees it. Think about the challenge of writing code to explain to the computer the difference between a cup and a bowl from sight alone. That's a massive problem to solve for just one simple object out of perhaps thousands or hundreds of thousands of fairly common objects. Today's smartphone-based AR happens within your environment, but hardly interacts with it. That's why all the AR experiences you're seeing on smartphones today are relegated to floors and walls—it's impossible for these systems to convincingly interact with the world around us, because they see it, but they don't understand it. For the sci-fi AR that everyone imagines—where my AR glasses show me the temperature of the coffee in my mug and put a floating time-remaining clock above my microwave—we'll need systems that understand way more about the world around us. So how do we get there? The answer seems like it must involve so-called 'deep learning'. Hand-writing classifying algorithms for every object type, or even just the common ones, is a massively complex task. But we may be able to train computerized neural networks—which are designed to automatically adapt their programming over time—to reliably detect many common objects around us. Some of this work is already underway and appears quite promising. Consider this video which shows a computer somewhat reliably detecting the difference between arbitrary people, umbrellas, traffic lights, and cars: https://www.youtube.com/watch?v=_zZe27JYi8Y The next step is to vastly expand the library of possible classifications and then to fuse this image-based detection with real-time environmental mapping data gathered from AR tracking systems. Once we can get AR systems to begin understanding the world around us, we begin to tackle the challenge of adaptive design for AR experiences, which just so happens to be our next topic. Continued on Page 3: Adaptive AR Design » Adaptive AR Design [caption id="attachment_68913" align="aligncenter" width="640"] Image courtesy Microsoft[/caption] To draw an analogy, it took web developers many years to develop reliable, practical design rules for getting a website to fit on screens of different shapes. And yet that seems like a simple task compared to adaptive AR design, which will need to work across a mind-boggling range of arbitrary environments spanning all three dimensions, rather than just a handful of common 2D screen sizes. This is not a trivial issue. Even VR game design—which has years of practical development time as a head start—is struggling with a much more basic version of this problem: designing for varying playspace sizes. Generally VR playspaces are square or rectangular in shape, and have nothing in them except for the player; a walk in the park compared to the complications that come with AR—and yet still an ongoing challenge. Consider: even for people living in identical apartment units, the arrangement of their furniture and the objects in their home is going to be completely unique. It is going to take many, many years for AR game design to evolve to understand how to create convincing entertainment experiences which can adapt to a seemingly infinite set of environmental variables—from floor plan to ceiling height to furniture arrangements and much more—across billions of different homes and buildings, not to mention wide-open outdoor spaces. You might think it isn't difficult to make a simple AR shooter where enemies will come from the only other room in someone's one-bedroom apartment, but don't forget that without pre-mapping the environment in the first place, the AR system wouldn't even know that there is another room. [caption id="attachment_68914" align="aligncenter" width="640"] Image courtesy RocketJump[/caption] Let's just assume that we've solved the object classification problem (discussed in the prior section)—such that the system could understand the objects around you at a human level—how could a developer create a game that takes advantage of those objects? Let's consider a simple farming game where players will plant and water augmented reality crops in their home using a real cup that will pour AR water. Neat idea... but what if there's no cup around? Does the game become useless? No... developers are smart... as a backup, let's just let the player use a closed-first as a stand-in for the cup; when they tilt it over, water will pour right out. It's foolproof! So now we move on to planting our crops. The American developer expects everyone should have enough room to plant 10 rows of corn, but half a world away, half of Europe is cursing because their typically smaller living quarters won't fit 10 rows of corn, and there's no fourth bedroom to act as the seed shed anyway. [irp] I could go on, but I'll spare you. The takeaway is this: if we're to escape from experiencing immersive AR only on blank floors and walls, we'll need to design adaptive AR games and apps which will need to utilize the actual space and objects around us, and somehow, through some very smart design, figure out to manage the billions of variables that come with that. While this challenge is perhaps the furthest behind of the three identified here, it's one that can begin to be worked out today on paper, ahead of the future devices that will actually be able to deliver these experiences. - - — - - I've heard many people in the last year or so suggest that AR and VR are matched in terms of technological maturity, but the truth is that AR is several years behind where VR is today. AR is massively exciting, but from hardware, to sensing and perception, to design, there's still major hurdles to overcome before we can achieve anything close to the common AR concepts we've seen over the last decade. It's an exciting time for AR, the field is still wide open, and the opportunity is ripe to break into the space with something that could move the entire sector forward. Get to it!