Switzerland-based CREAL is developing a light-field display which it hopes to bring to VR headsets and eventually AR glasses. In November the company raised CHF 6.5 million (~$7.2 million) in a Series A+ investment round to bring on new hires and continue miniaturizing the company’s light-field tech.

Creal says it closed its Series A+ investment round in mid-November, raising CHF 6.5 million (~$7.2 million) led by Swisscom Ventures with participation by existing investors Investiere, DAA Capital Partners, and Ariel Luedi. The new funding marks ~$15.5 million raised by the company thus far.

Over the last few years we’ve seen Creal make progress in shrinking its novel light-field display with the hopes of fitting it into AR glasses. Compared to the displays used in VR and AR headsets today, light-field displays generate an image that accurately represents how we see the real world. Specifically, light-field displays support both vergence and accommodation, the two focus mechanisms of the human visual system. Creal and others say the advantage of such displays is more realistic and more comfortable visuals for VR and AR headsets. For more on light-fields, see our explainer below.

Light-fields are significant to AR and VR because they’re a genuine representation of how light exists in the real world, and how we perceive it. Unfortunately they’re difficult to capture or generate, and arguably even harder to display.

Every AR and VR headset on the market today uses some tricks to try to make our eyes interpret what we’re seeing as if it’s actually there in front of us. Most headsets are using basic stereoscopy and that’s about it—the 3D effect gives a sense of depth to what’s otherwise a scene projected onto a flat plane at a fixed focal length.

SEE ALSO
'Ghosts of Tabor' Reaches $20 Million Milestone, Doubling Revenue Since 1.0 Launch on Quest

Such headsets support vergence (the movement of both eyes to fuse two images into one image with depth), but not accommodation (the dynamic focus of each individual eye). That means that while your eyes are constantly changing their vergence, the accommodation is stuck in one place. Normally these two eye functions work unconsciously in sync, hence the so-called ‘vergence-accommodation conflict’ when they don’t.

On more advanced headsets, ‘varifocal’ approaches dynamically shift the focal length based on where you’re looking (with eye-tracking). Magic Leap, for instance, supports two focal lengths and jumps between them as needed. Oculus’ Half Dome prototype does the same, seems to support a larger number of focal lengths. Even so, these varifocal approaches still have some inherent issues that arise because they aren’t actually displaying light-fields.

Having demonstrated the fundamentals of its light-field tech, Creal’s biggest challenging is miniaturizing it to fit comfortably into AR glasses while maintaining a wide enough field of view to remain useful. We saw progress on that front early this year at CES 2020, the last major conference before the pandemic cancelled the remainder for the year.

Through-the-lens: The accurate blur in the background is not generated, it is ‘real’, owed to the physics of light-fields. | Image courtesy CREAL

Creal co-founder Tomas Sluka tells Road to VR that this Summer the company has succeeded in bringing its prototype technology into a head-mounted form-factor with the creation of preliminary AR and VR headset dev kits.

Beyond ongoing development of the technology, a primary driver for the funding round was to pick up new hires that had entered the job market, Sluka said, after Magic Leap’s precarious funding situation and ousting of CEO Rony Abovitz earlier this year.

SEE ALSO
'New Folder Games' Released Nine Titles in One Year to Find Its Viral Quest Hit, 'I Am Cat'
Image courtesy CREAL

CREAL doesn’t expect to bring its own headset to market, but is instead positioning itself to work with partners and eventually license its technology for use in their headsets. The company aims to build a “complete technology package for the next-generation Augmented Reality (AR) glasses,” which will likely take the form of a reference design for commercialization.

Newsletter graphic

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. More information.


Ben is the world's most senior professional analyst solely dedicated to the XR industry, having founded Road to VR in 2011—a year before the Oculus Kickstarter sparked a resurgence that led to the modern XR landscape. He has authored more than 3,000 articles chronicling the evolution of the XR industry over more than a decade. With that unique perspective, Ben has been consistently recognized as one of the most influential voices in XR, giving keynotes and joining panel and podcast discussions at key industry events. He is a self-described "journalist and analyst, not evangelist."
  • Bob

    Lightfields are essentially the end-goal for all future VR/AR headsets but for now Facebook’s varifocal lens approach will get there faster in “solving” the vergence-accommodation conflict problem long before lightfields becomes a viable technology in consumer headsets.

    And to add: it’s great to see that companies are attempting to solve the problem in two entirely different ways which is a win-win for the end-user :)

    • Karen Buenrostro

      Get $192 hourly from Google!… Yes this is Authentic since I just got my first payout and has been really amazing because it had been the most significant quantity of $24413 in a week…(b9572)… It seems Appears Unbelievable but you won’t forgive yourself if you do not check it >>>> http://www.BeLifeStyles.com |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

    • kontis

      The inventor of that cool Nvidia’s near-eye light field display who was always hyping light-fields was no longer so sure about them being the end-goal in his recent speeches and is also researching alternatives.

      There are trade-offs.

      • Bob

        Oh is that right? And what was the reason?

        • kontis

          His micro lens array method particularly suffers from huge resolution reduction. But there are also other challenges.

      • Can our eyes be bypassed by BCI, and would our brains even recognise a visual signal?

        Considering the eyes contain 6 types of neuron; there is complex local processing within the eyeball, much more than organic cameras.

        Is it possible to bypass this wetware, the neurons have long axons wired directly into the brain (optics nerve, optic chiasm, optic tract)?

      • Bob

        Not the expert here but as far as I’m concerned lightfields is the closest you can get to simulating all the focal depth variances inherent with real life therefore getting the user’s eyes to react much more naturally, akin to how would they would react to light in the real world.

        If, in the perfect world, all headsets properly incorporated this technology, then that’s job done regardless of the computational requirements, so in essence it truly is the end-goal or the holy grail so to speak. And this only applies to devices we wear over our heads/ears.

        Now bypassing the eyes and getting directly to the brain itself? Well that’s an entirely different story. Now we’re long past the need for headset devices and into Matrix style “brain jacking” which suffice to say is far, far, future stuff that most likely will not exist as a technology within the next five decades.

      • Tomas S

        Indeed, light-field is unfortunately perceived as something with a lot of data, computation, and bandwidth. But that is not valid generally. It is a surviving misconception coming from the first light-field solutions based on classical displays with lens arrays. Such light-field displays usually tried to satisfy wide range of viewpoints while each single virtual pixel was represented by many real display pixels (easily by 30-60). That’s incredibly inefficient.

        In reality, when light-field is calculated for and projected to an eye, it does not really need more of anything compared to a flat image. The bottleneck is the eye itself.

        Light-field needs to carry only the information of how virtual pixels look like in focus from the perspective of the eye pupil. It requires max a few percent more of image information compared to a regular all-in-focus flat image of the same scene (for vast majority of 3D scenes). It needs also a depth info, but that’s an integral part of 3D content already.

        The few percent more carry the information about what an eye sees “behind” closer objects (eye sees a little bit around it because of the non-zero pupil size). But that’s very little.

        The blur perceived from light-field does not need more information. It is the same info, just optically smudged by the eye. One can imagine it as if color information of each pixel was either assembled in one point or disassembled in a larger area. The number of sharp pixels which eye can resolve in some field of view will be the same for light-field as for a flat image.

        Efficiency of image delivery to the eye is the true task for VR/AR, not power and more and more pixels.

        As you mentioned, an eye is too good input not to use it. A perfect ready-made natural non-invasive input connector to our minds and we know well what it understands – images….. And that’s only one part of the logic. The second part is the fact that an eye is surprisingly bad image sensor, much worse than it seems to us. Brain is the magic super-efficient image processor. (An eye is good only in some 5deg field of view – a coin size region half a meter from you – only here it needs the lens and therefore light-field or its perfect imitation). Once we manage to generate and deliver the virtual imagery with maximum efficiency, a today’s smartphone will be more than enough powerful computer to render a hyper-realistic VR/AR experience. Tomas (from CREAL)

        • Hamido

          How long will It take? Any hope in a year or two? I tried VR before. While immersive, it strained my eyes/brain. I think it was the vergence-accommodation conflict. I am always looking for info on far away these technologies (varifocal, light field …etc) are, but not finding much info.

          • Tomas S

            Great to see that the problem is known and solution needed. Two years are realistic for high-end enterprise light-field. We have already first dev. kits. Consumer light-field is more like 4 years from now.

        • vejimiv738

          Sorry but it looks like you are actively hiding some drawbacks with this tech and maybe also not understanding how realtime 3d works beyond the surface level. This is not intended to offend but your comment begs a response:

          1) “light-field is unfortunately perceived as something with a lot of data, computation, and bandwidth”

          Lightfields by definition have more data. This is not perception, this is fact, mentioned many times even just this year by veterans in the industry. It’s not about using lens arrays and integral imaging. A flat 2d image does not have directional information for each pixel. if your display device does not as well then you are redefining what lightfield is.
          With lightfields you either need more resolution or more frame/refresh rate.You claim it’s not the former so it’s the latter. Your only option then is monochrome DLP frames, a lot of them = more information

          2) “Light-field needs to carry only the information of how virtual pixels look like in focus from the perspective of the eye pupil.”

          Without eye tracking you don’t know what the eye focus is at any given point in time, so you need to provide information for every possible one, or realistically for bunch of focus distances so your eye will use the one closest to the expected one = more information.
          The perspective also depends on the eye rotation. The rotational pivot of the human eye is not the eye lens or pupil, it’s almost the center of the eyeball. As the eyeball rotates, the eye lens and pupil also shift, so correct information has to be provided for each of these perspectives unless there is eye tracking = more information.

          3) “It needs also a depth info, but that’s an integral part of 3D content already.”

          No it’s not, not all depth infromation is the same. Your typical realtime 3d content has a depth buffer. Depth buffer contains 2d depth information, meaning you can’t know if there’s something behind something else from the depth buffer. Occluded tris are culled, removed, you don’t have that data anymore down the rendering pipeline, otherwise we would need much better GPUs. As I explained even if the head position relative to the headset is fixed, for real lightfield you still need access to more depth data than what the depth buffer provides because you don’t know where your eye is looking at and the physical position of the eye pupil = more information

          4) ”Efficiency of image delivery to the eye is the true task for VR/AR, not power and more and more pixels.” – that’s a fancy way to refer to eye tracking. If we achieve good enough eye tracking, we don’t need lightfields at all, all we need is varifocals, adjustable focus lens, which Facebook has already demonstrated and few firms already provide to OEMs. If you believe eye tracking is essential and coming then your lightfield project is a dead-end. If you believe lightfield is needed because perfect eye tracking is not achievable, then your claim that more processing and bandwidth is not necessary is false. You can’t have both.

          Also, are we going to ignore how monochrome DLP frames, even dithered, look? I’ve worked with similar displays in the 2000s and they sure can’t compete when it comes to image quality with any modern display.

          It’s fine if you don’t want to talk about the drawbacks of your tech, I’ve worked with startups, I know how frustrating it is working on risky projects without clear future and a huge competition, but you shouldn’t be claiming it doesn’t have the drawbacks it has either, that’s just dishonest. Or you have your own definition of lightfield, which isn’t better.

          • Tomas S

            Hi vejmiv738, indeed it is good that you are giving me some navigation on what needs more reasoning when it comes to light-field efficiency. I will try my best to overlook the speculations about dishonesty etc., it really is insulting, but I understand that you see it from a standpoint where some claims are considered fundamentally impossible.

            Of course, there are drawbacks of one approach vs another, like always, but this time elsewhere than when you see it here. The fundamentally achievable competitive efficiency is actually one of the most positive features of the near-eye light-field potential.

            Few clarifications:
            (i) Light-field is typically associated with multiple times or even orders of magnitude more rendering effort and image data. I don’t claim light-field does not need or carry more, but that it can be in most cases very little more, such as a few percent more.

            (ii) Most of the talks and people speak about light-field in general terms. Yes, light-field really cannot be comparably efficient generally. The difference for specific cases (such as light-field panels) is gigantic. I speak about near-eye light-field which is limited to an eye pupil and projected in a specific way.

            (iii) Other times the discussions are led in the context of some specific HW often assuming you can use only off-the-shelf processing tools. Today’s graphics processing and display drivers do not allow efficient near-eye light-field projection. A lot of work is needed to smartly abuse what is available (and still not have full efficiency) or design the whole pipeline on your own (to reach maximum), including custom HW processing. This partly refers to your statement “Your typical realtime 3d content has a depth buffer. Depth buffer contains 2d depth information”. This is one of the places which need a custom treatment, but not considerably less efficient treatment.

            Let’s have a look at a simple example: two small points in space, one closer, one farther. Each of them has a certain color. Always in focus flat image of them on a flat screen would shine two e.g. 24bit pixels to all directions. Their 3D processing (yes I assume the same usecase, e.g. VR/AR) would include their depth unless one obscures the other, more below. To display the two points as a light-field, a light-field projector needs to distribute the color information of each point into a finite number of directions – rays – and give it a depth bias that is realized by multiple real but low color res “display” pixels (defining intersection of rays with a fixed plane in space). These directions can be predetermined (fixed) by HW and set to always enter the eye pupil. The key trick is, that each of these “rays” needs to carry only a fraction of the color information of the virtual point, because what matters is how the point looks when all components are assembled, i.e. when you focus on it. When you focus at a different distance, the rays of one virtual point hit retina at slightly different places, the color information is split and creates a blurred image of the virtual point, but still using the same data. Your eye does the blurring job. For this simple example, you needed the two colors and the distance. The process of shining them through multiple fixed directions does not require more information no matter how many directions your “display” allows to generate (as long as it can provide the right proportion of the total color per “ray”). It uses still the same information only expanded. So when you speak about a higher frame rate, then yes, you can use more frames, but you don’t need more image information, you just expand what you already have to more directions. Once you consider the two points are processed as 3D data, you already have all. To generate the color distribution into fixed rays is simple trigonometry. Light-field projection of more complex scenes follows the same principle. Each virtual point that is visible from the pupil (i.e. not the fully obscured ones) is distributed into a fixed set of rays. The projection process is the same for any scene and defines the light-field quality.
            Now to understand the light-field rendering you need to invert the process. The projection needs more rays per point, yes, but since all rays enter the eye simultaneously, it practically doesn’t matter how you distribute the color into them. Hence, you can choose. And since you can choose and the ray directions are defined, you only need the color of the point and its distance.

            Here I will only repeat. A typical 3D scene includes objects which obscure each other (even more problems come with transparent objects or smoke etc). The eye pupil can see slightly around closer objects, but this is in a vast majority of cases very little, few percent. The images seen from two extremities of your pupil (e.g. left and right side), are usually almost identical. There are multiple ways of rendering the part “behind” an object, but they all exploit the concept described above. Just think in the reverse order from the projection side.

            To a priory answer some possible questions.
            Yes, the full efficiency requires that all useful light enters the eye. You either have to secure it dynamically (which may require eye-tracking, but not for the light-field effect itself, only for the efficiency, and it can afford quite big errors) or you have to have some redundancy (exit pupil larger than eye-pupil) as you will lose some “rays” then.
            You may say that even slightly different viewpoints should receive different colors from the same point in space. Yes, in reality, but since eye receives them all and mixes them, the information is lost, which means you did not need it from start, you need only the combined color.
            We may have some different definitions. What matters in practice is a light-field for a human eye.
            There is certainly needed a custom electro-optical system or truly abused existing, but it does not change the efficiency.
            Yes, eye-tracking+varifocal is an engineering solution (100% dependent on precise eye-tracking), or even fast varifocal sweep is solution. But there are different pros and cons. Computing efficiency is not the main differentiator here other pros are taken into account. It is rather the robustness or how ultimate solution it is vs readiness level.

            I won’t go into more details, there is a lot of tricks that can be done or different ways of looking at it. I only hope that the above describes the fundamental logic. I am afraid I cannot do more.

          • Tomas S

            Hi vejmiv738, indeed it is good that you are giving me some navigation on what needs more explanation when it comes to light-field efficiency. Less thank for the accusation of dishonesty, it is not easy to take it lightly, but I understand that you see it from a standpoint where some claims are considered fundamentally impossible. I will try my best to clear it. I see the core of misunderstanding here: you write “a lot of them (frames) = more information”, in the light-field case we talk about “a lot of frames ≠ more information”, you just expand the image information for free just to deliver it to the eye differently. But slowly.

            Of course, there are drawbacks of one approach vs another, like always, but this time simply elsewhere than where you see it here. The fundamentally achievable high efficiency is actually one of the most positive features of the near-eye light-field projection.

            Few clarifications:
            (i) Light-field is typically associated with multiple times more or even orders of magnitude more rendering effort and image data. I don’t claim light-field does not need or carry more, but that it is in most cases very little more, such as a few percent more. That’s quite obviously not what you mean.

            (ii) Most of the talks and people speak about light-field in general terms. Light-field really cannot be comparably efficient generally (e.g. light-field display panels really need orders of magnitude more of everything or must be very bad). I speak about the light-field which is limited to an eye pupil and projected in a specific way.

            (iii) Other times the discussions are led in the context of some specific HW often assuming you have to use only existing processing tools. Today’s graphics processing displays and display drivers do not support near-eye light-field needs. A lot of work is needed to smartly abuse what is available (and still not have full efficiency) or design the whole pipeline on your own (to reach maximum). I mean including image processing HW. Light-field needs fully custom treatment, but not considerably less efficient treatment.

            Let’s think of a simple example: two small points in space, one closer, one farther. Each of them has a certain color. Always in focus flat image of them on a flat screen would shine two e.g. 24bit pixels to all directions. Their 3D processing (let’s speak about the same application such as AR/VR) would include their depth (unless one obscures the other, more below). To display the two points as a light-field, a light-field projector needs to distribute the color information of each point into a finite number of directions – rays – with a depth bias realized by multiple low color res pixel. These directions can be predetermined (fixed) by HW and set to always enter the eye pupil. The key trick is, that each of these “rays” needs to carry only a random fraction of the color information of the point because what matters is how the point appears on the retina when all components are assembled in focus. When you focus at a different distance, the color components are split apart and create a blurred image of the point, but still using the same data. Your eye does the blurring job. For this simple example, you needed the two colors and the distance. The process of shining them through multiple fixed directions does not require more information no matter how many directions your display allows to generate as long as it is able to provide the right proportion of the total color information per “ray”. It uses still the same information, only expanded. So when you speak about a higher frame rate, then yes, you can use more frames, but you don’t need more image information, you just expand what you already have to more directions. This is why “a lot of frames ≠ more information”.

            It should already be a full answer, but let’s elaborate it more anyway: Once you consider the two points are treated as 3D data, you already have all. Distributing the color into a fixed set of rays is simple trigonometry not adding anything. Light-field projection of more complex scenes follows the same principle. Each virtual pixel that is visible from the pupil (i.e. not the fully obscured ones) is distributed into a fixed set of rays. The projection is the same for any scene and defines the light-field quality. Now to understand the light-field rendering, you need to invert the process. The projection needs more rays per point, yes, but since all rays enter the eye simultaneously, it practically doesn’t matter how you distribute the color into them. Hence, you can choose. And since you can choose and the ray directions are defined, you only need the color of the pixel and its distance. For a primitive sparse scene (not obscuring objects) light-field image information is practically equal to a flat image.

            A typical 3D scene however includes objects which obscure each other (even more problems come with transparent objects or smoke etc). The eye pupil can see slightly around closer objects, but this is in a vast majority of cases very little, the few percent. Imagine the images seen from two extremities of your pupil (e.g. left and right), they are normally almost identical. There are multiple ways of solving it, but they all exploit the concept with the two pixels above. Just think in the reverse order from the projection side.

            Some a priory answer to anticipated questions.
            Yes, the full efficiency requires that all useful light enters the eye. You either have to secure it dynamically (which may require eye-tracking, but not for the light-field effect itself, only for the efficiency, allowing a big error) or you have to have some redundancy (exit pupil larger than eye-pupil) as you will lose some “rays” then. But there are even better solutions combining light-field with non-light-field :-).
            You may say that even slightly different viewpoints should receive different colors from the same point in space. Yes, in reality, but since eye receives them both, and mixes them, you lose this information, which means you did not need it from start, you need only the combined color.
            We may discuss some definitions. What matters in practice is a light-field for a human eye.
            Light-field is a much more complete and robust solution with perfectly correct monocular depth cues compared to the varifocal approach (eye-tracking based or fast sweep), the drawback of light-field is that it needs a fully custom pipeline (rendering, formating, HW processing and drivers, electo-optic elements, optics). Varifocal is a weekend project compared to it.

            I won’t go into more details, there is a lot of tricks that can be done or different ways of looking at it and also specific problems, but I hope that the above gave you the fundamental logic to revise the dogma about light-field data.

          • Tomas S

            Hi, I replied already twice here and both comments are “detected as spam” in my disqus account, waiting “to get this corrected” on my request. Strange. I hope you have received it at least by email.

            Not to write so long possibly for nothing again, then shortly… I see the core of misunderstanding in the assumption: “a lot of them (frames) = more information”, which is indeed not true in the light-field case we talk about. Because “a lot of frames ≠ more information”, meaning not new information which you would have to render. Since color of each virtual pixel is distributed to multiple low color resolution rays “randomly” and with fixed process, because all (or most) rays enter your pupil and combine at retina, you can expand the image information for free even in the modulator backplane if you design such. For instance, you can double the number of frames/rays at no rendering cost. Ultimately only the pixel color and its coordinates are needed. This should be enough to re-think all the points you introduce. There partial obscuring of objects etc is not trivial, but in this context a technicality.

          • vejimiv738

            Hi Tomas, I haven’t received anything. Write your text in Word, then copy paste to avoid losing it, or just write it on pastebin.com and share the link, there are options.

            Regarding this partial response:

            You seem to be considering “information” amount as merely the amount of streamed video data which is absolutely not the case in my original response or objectively. Even if you render 24 1-bit frames instead of 1 24 bit frame, it still requires more information, because each perspective in a 3d program has to be rendered separately. Yes, you can pack those 24 monochrome frames in a single 24 bit frame to save on data transfer like many volumetric displays do, but you are by no means saving on processing the same way. In fact, as soon as you do dithering you are assumed to be rendering 24 frames at 24 bit color depth to have the correct data to be able to do dithering. Finally, your 24 frames will not merely be rendering different color values, as soon as prespective changes the GPU has to start over and you may as well be rendering 24 new full color frames. This was your original statement which I was responding to originally:
            “Indeed, light-field is unfortunately perceived as something with a lot of data, computation, and bandwidth.”
            This is false, because at least the data and computation are much higher.

            Also you claim all or most of the low color depth rays are recombined at the retina. This is also not the case generally. For one it’s a lightfield, there’s no guarantee but secondly unless your device is bolted on a table and not moving and your eyes are constantly focusing on the same thing you will have the different time-multiplexed rays landing on different photoreceptor cells of your retina. So the full color pixels may be reconstructed in limited cases in controlled environments.

            Maybe something else is involved here but from what you’ve shared I don’t share your enthusiasm.

            Thanks.

          • Tomas S

            Hi vejiminv738, thank you for refining the topic. I think we could get to the core of our misunderstanding this way. I hope I answered the projection part and that physical light-field rays can be generated from simple data once we assume that each virtual pixel distributes its color evenly or randomly to a small cone of direction (the cone defined by the pixel and the eyebox).
            Rendering is somehow an inverse process of it. We do not have to calculate a new color for a virtual pixel from a new perspective (except those few which were obscured by closer objects) we can assume the same color. We can also calculate only partial information from each perspective (imagine something like short raytracing bursts). There are actually quite many options on how to do it. All of them, however, require entirely custom SW and HW treatment in the whole pipeline. You are certainly right that standard SW and HW tools cannot make this efficiently at all. Practically all must be designed and made custom, that’s why it is a tough job.
            In your comment, this is summarized here: “because each perspective in a 3d program has to be rendered separately”. In this case, there is no need to render all perspectives which you plan to project. But even if you do it, you can avoid recalculating the information which you already have. Hence, no need to render “separately”, either. The parallax difference within the pupil (or reasonably big eyebox) is so small that virtual pixels do not change color. Even if we consider that the color changes, many enter the eye simultaneously and make the difference negligible by that.
            The misunderstanding could be, therefore, in our definitions. Theoretical light-field should contain unique and kind of full color information in each ray of each pixel. That’s really fundamentally a lot. Practical engineering near-eye light-field display does not need it, because an eye cannot see that.
            (Minor points: distribution of image data to dither-like-looking components can be lossless, we care only about their sum. Yes, the rays of a de-focused virtual pixel hit the retina at different places, but enough close and enough messed with other de-focused pixels, that it appears as a very natural blur. This is not an assumption, it already works well … and real-time. The moving eye pupil can be satisfied by a redundant eye-box with redundant image data, or, better, by a primitive pupil tracking – not for the depth cue itself, but for the efficiency).

            PS: I have my previous replies, the problem is that it two times disappeared from here without notification or any provided reason. The one you see is the first one which passed and I couldn’t really expect it. However, I think it makes more sense now to narrow down the problem. Anyway, here it is.

          • vejimiv738

            I don’t know if you are purposefully being vague or assume too much in your message, because it’s pretty vague and doesn’t design around the problems I’ve mentioned by itself.

            You talk about the direction of the light cone of each pixel but not its angle, which you need to know its distance from the eye.

            You also claim different perspectives don’t need to be re-rendered and colors of pixels can be shared between them which also makes little sense as well. Feel free to teach that to the VR industry, we seem to be rendering two views for two eyes each frame for no good reason. Sure, the IPD distance is few times larger than the eye box, but not as different when you make a broad claim regarding both near and far virtual objects in the 3d scene. Your claim on how much data may be reused seems too optimistic.

            And your claim regarding dithering not being noticeable also makes little sense in that you talk about defocused pixels, while I don’t. I’m talking about the in-focus low-bitrate image. It’s too sharp to be corrected by another less focused pixel. Maybe you can introduce ome softness to the dithered image, but you can’t fix it based on the information that you have shared.

          • Tomas S

            Thank you for your feedback. Indeed, I have to be somehow vague, I cannot disclose receipt. I hope this is understandable. Sorry.

            Small clarification: “direction of the light cone”, I meant cone of directions/rays, i.e. a cone between virtual point and the pupil with a certain number of rays. The distance of the virtual point from the eye is defined at intersection of the virtual rays with a modulator image plane. The modulator image plane has a fixed position in space. The virtual rays pass through it and that’s where you generate (reflect) the real light-field rays.

            About the re-rendering: Displacements between viewpoints inside pupil and between eyes are one to two orders of magnitude different. The objects you see with two eyes are substantially rotated and whole objects can be obscured from eye to eye, this change is entirely negligible even across the whole pupil, and almost unnoticeable between neighboring viewpoints (<1mm apart). This is really not comparable. About the distribution of color information to different perspectives: Even stereo displays used anaglyps at some point. That’s very similar. It just doesn’t work that well as within a single pupil. (Too different images and two different sensors).

            About the noticeability of the dither-like images: Of course, if you assume a “low bit-rate image”, or better to say ray-rate, then it may be even very noticeable below some limit. This method needs a very high ray-rate which, however, becomes possible once you are not limited by the amount of rendered data. You can generate theoretically as many rays as you want, in practice you are limited only by the optoelectronic HW. That’s not my fantasy, I know how that light-field looks like at different speeds.

            Yes, I do not deny that varifocal is a fast engineering solution to the accommodation cues and that the necessity to remake the whole pipeline is a big disadvantage of the light-field (but it is like not having a railroad. Once it is built, the advantages pay back). Ligh-field cannot compete in a short run. There is substantially more work to do on it. (BTW, as I already mentioned, there exist more types of fast modulator technologies than one, no need to stick to what you write. But, anyway, efficient light-field needs this custom, too. The price is however much more dictated by volumes than by the tech). Nevertheless, no one is eating any lunch, yet.
            The main difference is in the level of dependence on the eye-tracking and its reliability. Varifocals need to generate the blur and even the ocular parallax very quickly and precisely while even occasional errors may be very disturbing. I haven’t tested that. Optics is also problematic, the image is not optically flat. Light-field does not need this robustness, indeed it does not need the eye-tracking at all, it is just a bonus. Light-field can correct aberrations.
            Fast varifocal sleep may work better and without eye-tracking, but that needs fast bright display and specific rendering, too, while it can technically support only a limited number of focal planes. I saw some, and it was pretty impressive for specific content. The objects were, however, splitting when crossing two or more plains.

            I hope with you, that you are wrong :-). Time will show.

          • vejimiv738

            Quick comment before I respond,

            You mentioned this twice and sweeping varifocals is just one way to do varifocals and probably not a good one. What I had in mind was what Facebook has demonstrated: eye tracked varifocal lens by use of birefringent plates combined with liquid crystal polarization rotators, a stack of them. Or a single layer liquid crystal lens. If you have eye tracking that makes more sense than sweeping the focus per frame and losing photons and having limited number of focal planes.

            Building a railroad may not be a good analogy to your lightfields, because you took one successful project from the past and ignored all the thousands of failed ones. Which is the case with your project? We don’t know if we can use railroads as analogy, or flying cars instead. All I know is if you aren’t supporting OpenGL, DirectX or Vulcan rendering and PBR then you already significantly limit your use case and you need to provide good enough advantages to everyone for this limitation. But more on that and the rest later…

          • Tomas S

            Thank you. I understood you meant the FB’s active lens stack with eye-tracking. I just consider the eye-tracking as a big problem in it. I think here Marty Banks went even to conclusion that eye-tracking cannot work perfectly in general and for all.

            Sure, there are many risks and uncertainties. If you wait for certainty, you will certainly be late.

          • vejimiv738

            Well, no technology really works perfectly and for all, if we are going to make that claim.

            Something that may be helpful to your firm as well in the future is that what Facebook wants from eye tracking and what for example, Varjo wants from eye tracking specs are different. Facebook wants eye tracking so precise you can have a foveated region of only 5 degrees and a steeper resolution falloff after that. The resolution that Facebook is aiming for for foveated rendering in the periphery is so low that if the eye tracking failed for a moment or lagged, you would be looking at a blurry mess. Varjo, on the other hand, right now has a 27 degree foveated region. So if your own 5-10 degree fovea and eye tracked foveated region do not perfectly match, it is still perfectly fine. On top of that, if your eye tracking fails for a moment with Varjo, you still have at least 1/3 of the resolution of the human eye. Yes, you notice resolution drop, but it’s not a blurry mess you are seeing anymore and by no means destroying the immersion or whatever you were doing with your headset, it’s now just an unpleasant artifact. For this kind of eye tracking the Omnivision OV6211 Varjo probably uses is as “perfect” as you need. There’s an article on Varjo’s eye tracking somewhere in this site.

            With varifocals it’s also not clear that we need the near-perfect version of eye tracking Facebook wants versus what Varjo has achieved today. At least one thing is known: Facebook has had test subjects for its eye tracked varifocals and mostly positive and promising reports and we shouldn’t forget that the human eye depth of focus isn’t perfect at any distance either, so some mismatch with what the eye expects and what is provided just won’t be noticeable.
            But this is a topic of its own. I should address your main points instead.

          • Tomas S

            Thank you for some additional insights! I tried Varjo, it was good. I didn’t try any of the half-domes. I am curious about the eye-tracking reliability and where the acceptable tolerance of errors is.

          • vejimiv738

            Check out Pupil-Labs, they use the same sensor and are open source software, payed hardware.

          • Tomas S

            Thanks, indeed, we use pupil-labs (among other)

          • vejimiv738

            I think the main “suspicious” part in your explanation is the part where you apparently claim that dithered frames which have different perspective and focus somehow have their pixels “sum up” with those from other such dithered frames to produce non-dithered full color imagery. Correct me if I misunderstood your claim, but if that is the case then something doesn’t add up since even just the focus of such overlapping pixels would be different.

          • Tomas S

            I see. The individual sub-frames have different perspectives, but not different focus (ideally). They are all practically “always-in-focus”, i.e. have very large depth-of-field, ideally infinite. You can imagine that each of the sub-frames is projected sharp on your retina regardless your eye focus. Because they pass through a small aperture, virtually, in reality there is no aperture at your pupil, small is the light-source. The change of focus of the eye just moves positions of the images on your retina. (Yes, in reality, each sub-frame has a finite depth of field, although very large, and a defined focal plane, because of non-zero aperture and diffractions, but it nevertheless works very well. The results prove it.)

      • vejimiv738

        If I’m not mistaken, it was probably the speech where it was stated that with good eye tracking you can just use varifocal optics?

  • oomph

    Finally some amazing tech
    in a glasses sized format

    • vejimiv738

      the glasses are just plastic mockup, not functional

  • This is a very interesting startup