At the company’s annual WWDC developer conference today, Apple revealed ARKit 3, its latest set of developer tools for creating AR applications on iOS. ARKit 3 now offers real-time body tracking of people in the scene as well as occlusion, allowing AR objects to be convincingly placed in front of and behind those people. Apple also introduced Reality Composer and RealityKit to make it easier for developers to build augmented reality apps.
Today during the opening keynote of WWDC in San Jose, Apple revealed ARKit 3. First introduced in 2017, ARKit is a suite of tools for building AR applications on iOS.
From the beginning, ARKit has offered computer vision tracking which allows modern iOS devices to track their location in space, as well as detect flat planes like the ground or a flat table which could be used to place virtual objects into the scene. With ARKit 3, the system now supports motion capture and occlusion of people.
Human Occlusion & Body Tracking
Using computer vision, ARKit 3 understands the position of people in the scene. Knowing where the person is allows the system to correctly composite virtual objects with regard to real people in the scene, rendering those objects in front of or behind the person depending upon which is closer to the camera. In prior versions of ARKit, virtual objects would always show ‘on top’ of anyone in the scene, no matter how close they were to the camera. This would break the illusion of augmented reality by showing conflicting depth cues.
Similar tech is used for real-time body tracking in ARKit 3. By knowing where people are in the scene and how their body is moving, ARKit 3 tracks a virtual version of that person’s body which can in turn be used as input for the AR app. Body tracking could be used to translate a person’s movements into the animation of an avatar, or for interacting with objects in the scene, etc.
From the footage Apple showed of their body tracking tech, it looks pretty coarse at this stage. Even with minor camera movement the avatar’s feet don’t remain particularly still while the rest of the body is moving, and small leg motions aren’t well tracked. When waving, the avatar can be seen to tip forward in response to the motion even though the user doesn’t. In the demo footage, the user keeps their arms completely out to the sides and never moves them across their torso (which would present a more challenging motion capture scenario).
For now this could surely be useful for something simple like an app which lets kids puppeteer characters and record a story with AR avatars. But hopefully we’ll see it improve over time and become more accurate to enable more uses. It’s likely that this was a simple ‘hello world’ sort of demo using raw tracking information; a more complex avatar rig could smartly incorporate both motion input and physics to create a more realistic, procedurally generated animation.
Both human occlusion and body tracking will be important for the future of AR, especially with head-worn devices which will be ‘always on’ and need to constantly deal with occlusions to remain immersive throughout the day. This is an active area of R&D for many companies, and Apple is very likely deploying these features now to continue honing them before the expected debut of their upcoming AR headset.
Apple didn’t go into detail but listed a handful of other improvements in ARKit 3:
- Simultaneous front and back camera
- Motion capture
- Faster reference image loading
- Auto-detect image size
- Visual coherence
- More robust 3D object detection
- People occlusion
- Video recording in AR Quick Look
- Apple Pay in AR Quick Look
- Multiple-face tracking
- Collaborative session
- Audio support in AR Quick Look
- Detect upt to 100 images
- HDR environment textures
- Multiple-model support in AR Quick Look
- AR Coaching UI
RealityKit
With ARKit 3, Apple also introduced RealityKit which is designed to make it easier for developers to build augmented reality apps on iOS.
Building AR apps requires a strong understanding of 3D app development, tools, and workflows—something that a big portion of iOS developers (who are usually building ‘flat’ apps) aren’t likely to have much experience with. This makes it less likely for developers to jump into something new like AR, and Apple is clearly trying to help smooth that transition.
From Apple’s description, RealityKit almost sounds like a miniature game engine, including “photo-realistic rendering, camera effects, animations, physics and more.” Rather than asking iOS developers to learn game engine tools like Unity or Unreal Engine, it seems that RealityKit will be an option that Apple hopes will be easier and more familiar to its developers.
With RealityKit, Apple is also promising top notch rendering. While we doubt it’ll qualify as “photo-realistic,” the company is tuning rendering the allow virtual objects to blend as convincingly as possible into the real world through the camera of an iOS device by layering effects onto virtual objects as if they were really captured through the camera.
“RealityKit seamlessly blends virtual content with the real world using realistic physically-based materials, environment reflections, grounding shadows, camera noise, motion blur, and more, making virtual content nearly indistinguishable from reality,” Apple writes.
RealityKit, which uses a Swift API, also supports the creation of shared AR experiences on iOS by offering a network solution out of the box.
Reality Composer
Just like RealityKit, Reality Composer aims to make things easier for developers not experienced with game engines, workflows, and assets. Apple says that Reality Composer offers a library of existing 3D models and animations with drag and drop ease, allowing the creation of simple AR experiences that can integrate into apps using Xcode or be exported to AR Quick Look (which allows built-in iOS apps like Safari, Messages, Mail, and more, to quickly visualize 3D objects at scale using augmented reality).
In addition to the built in object library, Reality Composer also allows importing 3D files in the USDZ format, and offers a spatial audio solution.