Software:
For the software side I decided to go with C/C++ since having the lowest possible latency is really important. I also decided to use OpenCV for the image analysis side and OpenGL to show some results. I’ve never used OpenCV before and I found it very good in general, apart from some outdated documentation in places.
Brief Program Explanation:
Initialization:
- Create / store 3D model of markers
- Apply gaussian blur to the image to prevent false blobs from being detected
Core functionality:
- Blob analysis: detect ellipses, obtain center coordinates, color and type
- Point detection: Match results of blob analysis with 3D model, numbering each marker
- Pose estimation: use the matched 3D model points and 2D image points to estimate the pose of the object
Show results:
- Apply Kalman filter
- Paint results of blob analysis and point detection with OpenCV
- Paint simple in-game map with OpenGL
To start off I decided to create my own blob detection algorithm in order to have greater control of the conditions and results. I was pleasantly surprised that my custom scanning patterns proved very reliable detecting up to 5 different colors (red, green, blue, white, yellow) under very different light conditions (complete darkness, artificial light and direct sunlight) and up to 1.5m distance approximately; not as good a distance as I would like but I’m quite confident that this could be improved to 2.5 – 3m without too much difficulty. Still, for now it was good enough to move the main focus to other parts of the problem.
For the point detection I faced the problem of having to identify 22 different markers having only 5 different colors. In order to do that I created quite a big number of fairly complex rules for each marker, that take into account not only the color of the current marker but the relative position and angles of the adjacent ones from all the possible views. This part can still be improved as sometimes the detection fails, but at this point I realized it would be better to rethink a bit the distribution of the LEDs on the hardware before trying to improve the software side.
For the pose estimation part I decided to use the OpenCV implementation of the POSIT algorithm. Unfortunately is not as simple as calling a function since the input expects the whole list of points of the object you want to detect, starting always with the same origin point, and for each frame we only have a small subset of these (the points visible by the camera). I managed to get around this by using a few tricks, which I won’t go into detail for now.
After we have the position and orientation values I thought it would be cool to show a simple real-time representation of a room, so it’s easier to evaluate the quality of the output. To do this i created my own OpenGL “engine”, so I could have total control of how the combination of inputs affect the in-game movement. The map I used is a hand-modified version from one of Nehe’s OpenGL tutorials: the result is a funky colored temple, with some kind of ‘tombstones’ inside. Head motion detection is exaggerated on purpose for demonstration, should be toned down when using it with a real HMD.
Results:
I think the first results — although not perfect — are quite promising, as I managed to get a fairly reliable tracking with a good response and good precision, which even at 30 FPS feels very fluid. I have tested 60 FPS successfully but the blob analysis results are not as good as with 30 FPS because the intensity of LEDs is not properly adjusted. This should be fixed with a fairly simple hardware modification.
In the video the little ‘wings’ that you can see on top and bottom of the prototype were added to simulate the top part of the head and the neck blocking some of the LEDs. You can still see some small glitches from time to time due to the LED detection failing. As I mentioned, it could probably be corrected with more complex rules but I have already thought of some modifications in the hardware and marker distribution to prevent it.
Also most of the jitter you can see (especially with the Kalman filter off) is related to orientation, but the main purpose of this system is to get positional tracking; combined with an IMU for orientation values and leaving the optical tracking to get only the positional information, the final result should be much better. Even though finding orientation was not the main goal of this project, these values are still very usable as you can see on the video.
I don’t really know the latency introduced by the PlayStation Eye camera itself but I decided to take some measurements in order to see the latency introduced by my software:
- normalized box filter blur = 3.38 ms
- blob analysis = 4.20 ms
- point detection + pose estimation = 0.06 ms
- opencv paint results = 0.60 ms
- opengl paint results = 0.06 ms
- Total: 8.30 ms apox.
First I used OpenCVs Gaussian Blur, but that was taking about 5.20 ms. Later I realized I could get away replacing it with a standard blur (normalized box filter). Blob detection still works great, and latency was reduced by almost 2ms (from 5.20 to 3.38 ms). I believe there’s still room for improvement in this area though.
Recap of Current Prototype Limitations:
- Blob analysis distance 1.5m. With hardware and software modifications could be improved to 2.5 – 3m
- Point detection fails sometimes; could be fixed using a different LED distribution
- 30 FPS. 60 FPS should be easy to obtain (with adjustments to the LED intensity). 120 FPS possible (at the expense of detection distance due to reduced resolution)