Meta Chief Scientist: New Compute Architecture Needed for True AR Glasses

Speaking at the IEDM conference late last year, Meta Reality Labs' Chief Scientist Michael Abrash laid out the company's analysis of how contemporary compute architectures will need to evolve to make possible the AR glasses of our sci-fi conceptualizations. While there's some AR 'glasses' on the market today, none of them are truly the size of a normal pair of glasses (even a bulky pair). The best AR headsets available today—the likes of HoloLens 2 and Magic Leap 2—are still closer to goggles than glasses and are too heavy to be worn all day (not to mention the looks you'd get from the crowd). If we're going to build AR glasses that are truly glasses-sized, with all-day battery life and the features needed for compelling AR experiences, it's going to take require a "range of radical improvements—and in some cases paradigm shifts—in both hardware [...] and software," says Michael Abrash, Chief Scientist at Reality Labs, Meta's XR organization. That is to say: Meta doesn't believe that its current technology—or anyone's for that matter—is capable of delivering those sci-fi glasses that every AR concept video envisions. But, the company thinks it knows where things need to head in order for that to happen. Abrash, speaking at the IEDM 2021 conference late last year, laid out the case for a new compute architecture that could meet the needs of truly glasses-sized AR devices. Follow the Power The core reason to rethink how computing should be handled on these devices comes from a need to drastically reduce power consumption to meet battery life and heat requirements. "How can we improve the power efficiency [of mobile computing devices] radically by a factor of 100 or even 1,000?" he asks. "That will require a deep system-level rethinking of the full stack, with end-to-end co-design of hardware and software. And the place to start that rethinking is by looking at where power is going today." To that end, Abrash laid out a graph comparing the power consumption of low-level computing operations. [caption id="attachment_107622" align="aligncenter" width="640"] Image courtesy Meta[/caption] As the chart highlights, the most energy intensive computing operations are in data transfer. And that doesn't mean just wireless data transfer, but even transferring data from one chip inside the device to another. What's more, the chart uses a logarithmic scale; according to the chart, transferring data to RAM uses 12,000 times the power of the base unit (which in this case is adding two numbers together). Bringing it all together, the circular graphs on the right show that techniques essential to AR—SLAM and hand-tracking—use most of their power simply moving data to and from RAM. [irp] "Clearly, for low power applications [such as in lightweight AR glasses], it is critical to reduce the amount of data transfer as much as possible," says Abrash. To make that happen, he says a new compute architecture will be required which—rather than shuffling large quantities of data between centralized computing hubs—more broadly distributes the computing operations across the system in order to minimize wasteful data transfer. Compute Where You Least Expect It A starting point for a distributed computing architecture, Abrash says, could begin with the many cameras that AR glasses need for sensing the world around the user. This would involve doing some preliminary computation on the camera sensor itself before sending only the most vital data across power hungry data transfer lanes. [caption id="attachment_107620" align="aligncenter" width="640"] Image courtesy Meta[/caption] To make that possible Abrash says it'll take co-designed hardware and software, such that the hardware is designed with a specific algorithm in mind that is essentially hardwired into the camera sensor itself—allowing some operations to be taken care of before any data even leaves the sensor. [caption id="attachment_107619" align="aligncenter" width="640"] Image courtesy Meta[/caption] "The combination of requirements for lowest power, best requirements, and smallest possible form-factor, make XR sensors the new frontier in the image sensor industry,” Abrash says. Continue on Page 2: Domain Specific Sensors » Domain Specific Sensors He also revealed that Reality Labs has already begun work to this end, and has even created a prototype camera sensor that's specifically designed for the low power, high performance needs of AR glasses. The sensor uses an array of so-called digital pixel sensors which capture digital light values on every pixel at three different light levels simultaneously. Each pixel has its own memory to store the data, and can decide which of the three values to report (instead of sending all of the data to another chip to do that work). This doesn't just reduce power, Abrash says, but also drastically increases the sensor's dynamic range (its ability to capture dim and bright light levels in the same image). He shared a sample image captured with the company's prototype sensor compared to a typical sensor to demonstrate the wide dynamic range. [caption id="attachment_107621" align="aligncenter" width="640"] Image courtesy Meta[/caption] In the image on the left, the bright bulb washes out the image, causing the camera to not be able to capture much of the scene. The image on the right, on the other hand, can not only see the extreme brightness of the lightbulb's filament in detail, it can also see other parts of the scene. This wide dynamic range is essential to sensors for future AR glasses which will need to work just as well in low light indoor conditions as sunny days. Even with the HDR benefits of Meta's prototype sensor, Abrash says it's significantly more power efficient, using just 5mW at 30 FPS (just under 25% of what a typical sensor would draw, he claims). And it scales well too; though it would take more power, he says the sensor can capture up to 480 frames per second. [irp] But, Meta wants to go even further, with even more complex compute happening right on the sensor. "For example, a shallow portion of the deep neural networks—segmentation and classification for XR workloads such as eye-tracking and hand-tracking—can be implemented on-sensor." But that can't happen, Abrash says, before more hardware innovation, like the development of ultra dense, low power memory that would be necessary for "true on-sensor ML computing." While the company is experimenting with these technologies, Abrash notes that the industry at large is going to need to come together to make it happen at scale. Specifically he says "the development of MRAM technologies by [chip makers] is a critical element for developing AR glasses." "Combined together in an end-to-end system, our proposed distributed architecture, and the associated technology I've described, have potential for enormous improvements in power, area, and form-factor," Abrash sumises. "Improvements that are necessary to become comfortable and functional enough to be a part of daily life for a billion people."