Hands-on: Apple Upgrades Personas for True Face-to-face Chats on Vision Pro

Apr 2, 2024

Apple today released ‘Spatial Personas’ in public beta on Vision Pro. The newly upgraded avatar system can now bring people right into your room. We got an early look.

Much has been said about Apple’s Persona avatar system for Vision Pro. Whether you find them uncanney or passable, one thing is certain: it’s the most photorealistic real-time avatar system built into any headset available today. And now Personas is getting upgraded with ‘Spatial Personas’.

But weren’t Personas already ‘spatial’? Let me explain.

Sorta Spatial

At launch the Persona system allowed users to scan their faces into the headset to create a digital identity that looks and moves like the user thanks to the bevy of sensors in Vision Pro. When doing a FaceTime call with another Vision Pro user (or users), their Persona(s) head, shoulders, and hands would be shown inside a floating box.

While this could feel like face-to-face talking at times, the fact that they were contained within a frame (which you can move or resize like any other window) made it feel like they weren’t actually standing right next to you. And that’s not just because of the frame, but also because you weren’t actually in a sharing the same space as them—it’s not like they could walk right up to you for a high-five, because they’d be stuck in the window on your screen.

Face-to-face

Now with Spatial Personas (released in beta today on the latest version of VisionOS), each person’s avatar is rendered in a shared space without the frame. When I say ‘shared space’, I mean that if someone takes takes a step toward me in their room, I actually see them come one step closer to me.

Previously the frame made it feel sort of like you were doing a 3D video chat. Now with the shared space and no frame, it really feels like you’re standing right next to each other. It’s the ‘hang out on the same couch’ or ‘gather around the same table’ experience that wasn’t actually possible on Vision Pro at launch.

And it’s really quite compelling. I got a sneak peek at the new system in a Vision Pro FaceTime call with four people (though up to five are supported total), all using Spatial Personas. You’ll still only see their head, shoulders, and hands but now it really feels like a huddle instead of a 3D video chat. It feels much more personal.

Spatial Personas Are Opt-in

To be clear, the ‘video chat’ version of Personas (with the frame) still exists. In fact, it’s the default way that avatars are shown when a FaceTime call is started. Switching to a Spatial Persona requires hitting a button on the FaceTime menu.

And while this might seem like a strange choice, I actually think there’s something to it.

On the one hand, the default ‘FaceTime in Vision Pro’ experience feels like a video chat. In everyday business we’re all pretty used to seeing someone else on the other side of a webcam by now. And even though this is more personal than an audio-only call, it’s still a step away from actually meeting with someone in person.

Spatial Personas is more like you’re actually meeting up in person, since you can actually feel the interpersonal space between you and the other people in this shared space. If they walk up and get a little too close, you’ll truly feel it in the same way if someone stands too close to you in real life.

So it’s nice to have both of these options. I can ‘video chat’ with someone with the regular mode, or I can essentially invite them into my space if the situation calls for a more personal meeting.

The Little Details

Apple also thought through some smaller details for Spatial Personas, perhaps the most interesting of which is ‘locomotion’.

Room-scale locomotion is essentially the default. If you want to move closer to a person or app… you just physically walk over to it. But what happens if it’s outside the bounds of your physical space? Well, instead of directly moving yourself virtually, you can actually move the whole shared space closer or further from you.

You can do this any time, in any app, and everyone else will see your new position reflected within their space, keeping everything synchronized.

Apple also made is so when two Spatial Personas get too close together, they will temporarily revert to just looking like a floating contact photo. I think this is probably because they want to avoid possible harassment or trolling (ie: you want to annoy someone so you phase your virtual hand right through their virtual face, which is uncomfortable both visually and from an interpersonal space standpoint).

The headset’s excellent spatial audio is of course included by default, so everyone sounds like they’re coming from wherever they’re standing in the room, and their voices actually sound like they’re in your room (based on the headset’s estimate of what the acoustics should sound like). And if you move to a fully immersive space like an ‘environment’, the spatial audio transitions to that new acoustic environment—so for instance you can hear people faintly echoing in the Joshua Tree environment because of all the rock surfaces nearby. Hearing the acoustics fade from being inside your own room to being ‘outside’ in an environment is a subtle bit of magic.

And last but not least, it’s possible to have a mixed group of FaceTime participants. For instance you could have people using an iPhone, an Android tablet (yes you can FaceTime with people on non-Apple devices), a normal Persona, and a Spatial Persona all at once. SharePlay in that case will also work between those formats (except non-Apple devices) as long as long as the app supports it. In cases with apps that are Vision Pro native, the iPhone user would get a notification that their device isn’t supported.

– – — – –

Spatial Personas is a big upgrade to Apple’s avatar system, but the company maintains the whole Persona system is still in ‘beta’. Presumably that means there’s more improvements yet to come.