Prolific YouTuber Tom Scott recently visited the Mixed Reality Lab at the YouTube Space studio in New York in order to showcase mixed reality capture. The lab is using a technique involving a green screen and a tracked camera to combine real world footage with a virtual reality environment, and helpfully lays out the basic concepts of the technique.
One of the persistent hurdles when explaining or promoting augmented and virtual reality technology is the need to convey the experience when viewed on a conventional display. One effective way to achieve this is with mixed reality, a combination of capture techniques that merge footage of the real world with the virtual world.
Northway Games, developers of Fantastic Contraption, were among the first to embrace the technique, and their various guides are a good place to start for those already versed in digital capture and streaming. Valve’s introduction video to SteamVR at the launch of the HTC Vive last year remains a superb illustration of the impact of presenting VR in this way.
YouTube Space New York, one of several official learning and creative YouTube studios around the world, has a Mixed Reality lab using an HTC Vive setup, which received mainstream exposure on late night talk show Conan in November. Recently Tom Scott, a popular YouTuber, visited the lab, providing a somewhat more sensible presentation about the equipment as part of his ‘Amazing Places’ series. The video is seen heading this article.
Tom Small, manager of New Technology Programs at YouTube Spaces and Travis Butler, manager of Technology at YouTube Spaces in The Americas were on hand to explain some of the details. Small describes how a third Vive controller attached to the handheld camera filming the green screen allows the software to create a virtual camera that tracks in the same way, and then both types of capture can be combined to create the mixed reality footage. The upcoming Vive Tracker is ideal for this application, rather than having to use a third controller.
Butler explains the need for a powerful PC to achieve best results; in this case they’re using a high-end Intel CPU, 32GB RAM, and an NVIDIA GTX 1080 graphics card in order to render at 4K. The system is capable of generating the composite image in realtime, and a higher quality output can be created with post processing. The high system requirements are also due to the additional views being rendered alongside the normal VR output; the virtual camera view and a foreground and background layer used to provide a basic depth effect.
This ‘video sandwich’ method is achieved by rendering a view of the ‘background’ and ‘foreground’ separately, determined by the position of the headset relative to the virtual camera. Any geometry that appears between the headset and the camera is considered foreground, and anything ‘behind’ the headset is background.
It’s a demanding and inaccurate solution, and a problem that Owlchemy Labs, developers of Job Simulator, have addressed using a more advanced approach. Their first blog entry explains the difference between the common mixed reality solutions of green screen overlay and the foreground-background sandwich method, and why their depth-sensing solution is so much better; a custom shader and plugin uses depth information from a stereo camera to render the user in-engine in realtime, meaning no external compositing is required, and accurate depth is achieved. In addition, their second blog entry shows how this depth information means the user can receive dynamic lighting from the game engine in realtime, as well as solving transparency problems and removing green screen boundaries.