Real-time Object Classification

Image courtesy Google

Apple’s ARKit and Google’s ARCore tech will let you do some pretty nifty and novel AR-like things on your smartphone, but for the most part these systems are limited to understanding flat surfaces like floors and walls. Which is why 99% of the AR apps and demos out there right now for iOS take place on a floor or table.

Why floors and walls? Because they’re easy to classify. The plane of one floor or wall is identical to the plane of another, and can be reliably assumed to continue as such in all directions until intersecting with another plane.

Note that I used the word understand rather than sense or detect. That’s because, while these systems may be able to ‘see’ the shapes of objects other than floors and walls, they cannot presently understand them.

Take a cup for instance. When you look at a cup, you see far more than just a shape. You already know a lot about the cup. How much? Let’s review:

  • You know that a cup is an object that’s distinct from the surface upon which it sits
  • You know that, without actually looking into the top of that cup, that it contains an open volume which can be used to hold liquids and other objects
  • You know that the open volume inside the cup doesn’t protrude beyond the surface upon which it sits
  • You know that people drink from cups
  • You know that a cup is lightweight and can be easily knocked over, resulting in a spill of whatever is inside

I could go on… The point is that a computer knows none of this. It just sees a shape, not a cup. Without getting a full view sweep of the interior of a cup, to build a map of the shape in its entirety, the computer can’t even assume that there’s an open interior volume. Nor does it know that it is a separate object from the surface it sits upon. But you know that, because it’s a cup.

But it’s a non-trivial problem to get computer-vision to understand ‘cup’ rather than just seeing a shape. This is why for years and years we’ve seen AR demos where people attach fiducial markers to objects in order to facilitate more nuanced tracking and interactions.

Why is it so hard? The first challenge here is classification. Cups come in thousands of shapes, sizes, colors, and textures. Some cups have special properties and are made for special purposes (like beakers), which means they are used for entirely different things in very different places and contexts.

SEE ALSO
22 Great VR Games for Relaxation & Meditation on Quest, PC VR, and PSVR 2

Think about the challenge of writing an algorithm which could help a computer understand all of these concepts, just to be able to know a cup when it sees it. Think about the challenge of writing code to explain to the computer the difference between a cup and a bowl from sight alone.

That’s a massive problem to solve for just one simple object out of perhaps thousands or hundreds of thousands of fairly common objects.

Today’s smartphone-based AR happens within your environment, but hardly interacts with it. That’s why all the AR experiences you’re seeing on smartphones today are relegated to floors and walls—it’s impossible for these systems to convincingly interact with the world around us, because they see it, but they don’t understand it.

For the sci-fi AR that everyone imagines—where my AR glasses show me the temperature of the coffee in my mug and put a floating time-remaining clock above my microwave—we’ll need systems that understand way more about the world around us.

So how do we get there? The answer seems like it must involve so-called ‘deep learning’. Hand-writing classifying algorithms for every object type, or even just the common ones, is a massively complex task. But we may be able to train computerized neural networks—which are designed to automatically adapt their programming over time—to reliably detect many common objects around us.

Some of this work is already underway and appears quite promising. Consider this video which shows a computer somewhat reliably detecting the difference between arbitrary people, umbrellas, traffic lights, and cars:

The next step is to vastly expand the library of possible classifications and then to fuse this image-based detection with real-time environmental mapping data gathered from AR tracking systems. Once we can get AR systems to begin understanding the world around us, we begin to tackle the challenge of adaptive design for AR experiences, which just so happens to be our next topic.

Continued on Page 3: Adaptive AR Design »

1
2
3
Newsletter graphic

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. More information.


Ben is the world's most senior professional analyst solely dedicated to the XR industry, having founded Road to VR in 2011—a year before the Oculus Kickstarter sparked a resurgence that led to the modern XR landscape. He has authored more than 3,000 articles chronicling the evolution of the XR industry over more than a decade. With that unique perspective, Ben has been consistently recognized as one of the most influential voices in XR, giving keynotes and joining panel and podcast discussions at key industry events. He is a self-described "journalist and analyst, not evangelist."
  • Claus Sølvsten

    will getting AR into hands of everyone not solve problem 2 and 3 with time

    • benz145

      For #3, perhaps. #3 doesn’t rely on the invention of new or unforeseen tech, it just requires a lot of thinking, experimenting, and hands-on testing.

      #2 on the other hand will require a lot of specific technical work.

  • dk

    №4 doing great quality vr
    btw the meta2 currently has terrible tracking

  • Facts

    Magic leap is 200 degree and just a little bit more bulky than a pair of sunglasses

    • dk

      390 degrees in case u look around :P xD

      • Lucidfeuer

        Was it not 475° rather?

        • dk

          or bajillion ….something like that

    • Blake Schreurs

      Sure, all we need to do is get them to actually release something instead of marketing vaporware like they’ve done for the last 3 years.

    • traschcanman

      Makes you smarter and better looking too .

    • NooYawker

      Magic leap is 1000 degrees and the thinnest and lightest thing ever created, lasts 1 year without charging and uses A.I.
      It also does your taxes.

      • Facts

        Human field of view is 220

        • Blinko23

          Party pooper.

        • RFC_VR

          Beyond 140 can reportedly increase tendency to simulator sickness even on 90hz low latency display with accurate tracking.

          Perhaps foveated hmd is near future solution to present more ‘natural’ display with focal sweetspot in wide FOV with soft peripheral. Convergence/accomodation conflict needs addressing.

          • dk

            u know what a light field display is right…..it’s imitating the way light from the world gets to your eyes ……there is no convergence/accommodation conflict……u can focus on a close up virtual object without any eye tracking and that is what their old video of the solar system is showing …..but eye tracking is useful for various reasons
            ….and there are opaque light field displays too…..like the old nvidia demo …….for that to work u need a really high resolution panel because the array of lenses reduces the perceived resolution and u need eye tracking so that the system doesn’t need to render everything….but with that u can have perfectly capable vr headset with sunglasses form factor

    • lovethetech

      fakeware by fakeleap.

    • benz145

      Can’t tell if joke or serious.

      • Facts

        Seriously, someone who got to try it told me, right now they are having problem with battery, that’s why it’s not out yet. The battery only last 4 hours of usage.

        • dk

          bullshit….a dude goes left and right telling his friends actual numbers about it and this hasn’t reached the ML reddit
          …..what is the profession of this friend…..and what exactly did he say

          • Facts

            He’s a very successful business man who back the product, he’s not allowed to post about it, if he do he will get sue due to a contract.

          • dk

            lol that sounds so legit
            no point in discussing it ….we’ll see something ….possibly a development kit in 4 to 6 months….from what they were saying about their roadmap

            and if u look at all their patents and the lens Rony was holding pretty long ago it’s a flat wafer and something like 200 is impossible with that …..it’s maximally 90-100 but I would bet on 60-70 because with a first gen device running on what is pretty much under powered phone hardware with high angular resolution with a big fov would really be hard to drive …..but we’ll see what’s the situation with the eye tracking

          • Facts

            We will see, but i heard that they are using a different break through, game changing technology approach to accomplish this.

  • Blake Schreurs

    Input! Input is a major problem with AR. Gaze-based techniques feel weird. Hand-recognition is imprecise and tires you out, voice recognition is imprecise AND can be hijacked by the person standing next to you (on purpose or by accident), using a game controller feels a little strange when using AR, and keyboards aren’t really an option. Ever tried to input a password in AR? It’s awful.

    • benz145

      Depending upon the use-case, I think hand recognition is the way to go (since you want AR to be mobile). For day-to-day interactions, I think Leap Motion’s latest work is highly promising. Serious gaming requires more precise and reliable input, so maybe controllers (tracked from cameras on the headset, a la Windows VR headsets) could be optionally added for serious gaming.

    • guest

      Yes, use a Bluetooth keyboard…

  • Al

    For FOV issue, The solution is likely going to be a pass-through camera on a full HMD.

    • dk

      yep and if it’s similar to the old nvidia prototype with an opaque light field near eye display …….the sunglasses form factor instead of a box on your face is absolutely possible

    • Lucidfeuer

      Duh. And yet none of the manufacturers seems to have gotten their hands on what was VR headset’s main shortcoming from the beginning.

    • benz145

      Certainly a possibility:

      https://www.roadtovr.com/zed-mini-turns-rift-vive-high-end-ar-dev-kit/

      But our eyes are pretty awesome at resolution, dynamic range, field of view, and ‘refresh rate’. Making tiny, low power cameras that can match all of that is going to be a challenge — not to mention that the internal display(s) will need to have output capabilities equal to the camera’s input capabilities, otherwise the display becomes the perception bottleneck.

      • Lucidfeuer

        Well I don’t know if the goal is to match eyes “specs” since as you mention this is a whole paradigm of challenges to be tackled.

        And beside the basic lack of a yet compulsory component of current VR headsets, pass-through AR is more of a challenge of matching per-user eye-tracking (for IPD and Focus adjustement) with front facing IR-tracking datas in order to match vision in terms perspective adjustments: when you get your hand close to your face, given that the camera will be closer to the hand than your actual eyes are, you have to adjust image perspective of your hand. The same goes for every objects/components of the scene being captured which then has to match a user own’s perspective.

      • Al

        Short of wiring directly into the optic nerve, displays will always be a bottleneck.

  • JesperL

    I dont believe AR has a future for gaming, and thus no big need for immersion.
    AR is going to shine in business, production, teaching, industry, work etc.
    VR is far superior in gaming, simply because of immersion and the fact that AR is limited to visuals that is integreated with the real room. Gaming just works better inside a virtual world, away from real life. – Yes sure, AR has potential for a lot of games still, on tables, and with monsters coming in waves out of the walls – but that is still limited comparede to ALL games you can play on a PC today.

    • dk

      good hmds will do both…..either or is a false dichotomy

    • benz145

      I think AR has a future for a certain kind of gaming, but maybe a very different kind than VR. However I also agree with @disqus_KTchdTC42u:disqus that AR/VR devices of the future are likely to offer both modes interchangeably.

  • jedda

    Have you looked at the Immy? Has 60 degree fov in a very lightweight package. http://www.immyinc.com/

    • benz145

      Heard about it, but so far haven’t had an opportunity to try it.

      • jedda

        Got eyes on recently with prototype. Looks very promising. I would classify it sort of as a shrunken version of the meta 2, with similar effective FOV fell to it. I think they may have the optical approach that can actually lead to something wearable.

  • NooYawker

    The largest hurdle is not an AR problem but an all encompassing hurdle for wearable and mobile tech.. the battery. The ultimate goal is to have AR glasses you wear everywhere you go. The battery is what will ultimately make this the future or hinder it’s use.

    • benz145

      IMO if AR is good enough, early adopters won’t mind wearing a pocketed battery for the time being, and the mainstream may find that acceptable too, depending upon the value these devices can deliver.

      • MosBen

        Indeed, and the important word in that original post is “ultimate”. In 100 years we’ll have AR glasses that we wear basically all the time and only have to charge every couple hours. But AR won’t need to hit that goal to become massively adopted. It just needs to shoot for impressive enough paired with good enough tech and an application that makes people’s lives better or is super fun.

        VR is the same thing. It’s not that current VR HMDs are the ultimate expression of the product category, but it’s good enough for consumer level early adopters, and the next generation will broaden the appeal with things like being cheaper, more comfortable, and having some tech upgrades.

        An AR setup that was reasonably portable, had a 90 degree FOV, and was less than $1,000 would sell a bunch of units. My parents wouldn’t buy one, but probably enough people to push the tech along to the next generation.

        • Muzufuzo

          “we wear basically all the time and only have to charge every couple hours”
          I don’t understand that. You mean we won’t have to charge them everyday? I hope that will be the case in 7 instead of 100 years.

          • MosBen

            My point is that the ultimate version of a technology, say, the very best buggy cart whip, is not a necessary condition for mass adoption, and even less so for enthusiast adoption. Right now AR tech has several limitations that are making it impractical even for the enthusiast, business, or education markets. In the next few years we’ll see improvements in FOV, ergonomics, cost, and battery life that will likely push it into the enthusiast market, where people are willing so spend relatively a lot of money on products with pretty significant drawbacks. But the tech will improve a bit more and we’ll see adoption in the business, education, and consumer markets even though the tech will still not be anywhere near perfect. It’ll get good enough, which may mean that we don’t use AR glasses all day every day, but they are useful enough that most people will have them.

          • Muzufuzo

            In my opinion it will take 3 years for enthusiasts and workers to start using AR glasses in everyday life. 7 years for massive adoption, as by that time we should have the quivalent of iPhone 4S (2011) in this area. The biggest unknown is probably battery technology. As for computers, it’s quite logical to assume we will be using 3D chips with von Neumann architecture coupled with 3D neuromorphic chips, on new materials. By 2027 smartphones will become almost extinct.

  • impurekind

    AR is still a lot of bullshot at this point in time.

  • lovethetech

    Stop comparing Desktop based (tethered) gadgets to AR googles running on batteries with a less power-intensive CPU.
    Stop copying the 2 year old information’s based articles again.
    Stop copying.
    Stop writing.

    • benz145

      What?

  • Muzufuzo

    Well … everything is bad in AR at this moment. I’ve tried both the Glass and the Hololens. Glass is very underwhelming, Hololens is much better but it is clear, there is much to do before its prime time. Batteries, display holographic resolution, volume, weight, processing power, memory, FOV, gesture recognition, voice control, eye tracking, AI and price. Everything must get much much better before wide adoption of AR glasses.

    • benz145

      Glass is an HMD (head-mount display), but not an AR headset because it doesn’t “augmented” the world, it simply shows information to you in a convenient, hands-free way.

      Aspects of HoloLens are very impressive. The tracking in particular is phenomenal:

      https://www.roadtovr.com/microsoft-hololens-inside-out-tracking-augmented-reality-virtual-reality-ar-vr/

      But it’s easy to get distracted by the limited field of view, which does major damage to immersion.

      • Muzufuzo

        I would like FoV to be the first thing being improved upon.

  • Interesting article. IMHO you should have mentioned two things:
    1. We have some libraries for object recognition, like Vuforia or Wikitude. It’s very rough, but it can be exploited by game developers;
    2. There’s the lack of a common AR infrastructure. Some guys on Medium call it the ARCloud. Without the ARCloud, every AR game is confided to a single room, a single place. We need something to share AR all together. Read this, for instance https://medium.com/super-ventures-blog/why-is-arkit-almost-useless-without-the-arcloud-6ee1e7affc65 .

  • Hafsa Yaqoob

    Quite interesting article, but there is also a good side of Augmented Reality. It’s been a wonderful news that Cresset Technologies has launched Pakistan’s first ever Augmented Reality app in collaboration with Sapphire. Cresset AR fashion app is easy to use and gives you an idea of how the dresses would look on your body. With Sapphire closets, you can also check out the dresses, and pick one and try it. This is by far the best app for trying a dress before buying. http://www.prweb.com/releases/2018/02/prweb15246174.htm
    Android Users: https://play.google.com/store/apps/details?id=com.cressettech.sapphireARNew
    IPhone Users: https://itunes.apple.com/pk/app/sapphire-ar-sense/id1350251001?mt=8

  • Talkyang