Prediction in Image Space using Kalman Filter

Ramesh Raskar


In an interactive head tracked display system, temporally adjacent frames are very similar when the viewpoint used to render them changes gradually. This frame to frame coherence can be exploited to avoid conventional rendering in most frames [Azuma95][Costella93].

More recently, graphics programmers have realized that majority of frames can be generated by performing an image warp to interpolate nearby rendered frames. [McMillan95][Mark96] Image warping and interpolation itself has been widely studied. [Wolberg]

There has also been work in "predicting". Using the prediction of various parameters, the graphics system computes a new picture for the next frame with reduced latency.

Kalman Filter has been used here at UNC-CS to predict the motion of user wearing HMD. [Welch 96]. Similarly many signal processing and filtering ideas have been used in predicting user motion.

However, except [Costella93] to some extent, there has not been a great focus on predicting at pixel level. If we want the graphics system to show very detailed say ray-traced image every frame, the prediction of user motion is not enough. The latency of the virtual environment turns out to be much smaller than time it would take to draw a detailed images.

My idea is to predict in image space. Motion of each pixel is predicted using history of its motion and information about the user head motion obtained from the head tracker. Assuming correspondence problem is solved for successive frames and optical flow can be computed, the technique attempts to improve the apparent frame rate.

Comparison with Other Systems

It is worthwhile to compare this technique with other current methods. The first set of methods reduce latency by image warping. In immersive system [McMillan95] precompute correspondences manually and then allow 6-DOF for the user by first reading users view and then image warping to achieve higher frame rate. [Mark96] avoid frame by frame correspondence by maintaining Z-buffer values. Regan and Pose [Regan94], and Talisman [Torborg96] both attempt to render part of the frame separately. Although all these system reduce the delay between user motion and frame update, all assume that rendering time is a much smaller part of the total latency (where latency is the sum of delay between finding user location and time required to present the image to the user). The question is, can we compute image for the next instant even before we find out where the user is going to be ?

In the second set of methods, the latency of the tracker system is reduced by predicting the user motion. [Welch96][Azuma95].

The two, image warping and prediction have not been used simultaneously. Part of the reason is prediction is usually done in user space and the image warping is obviously done in image space. To combine the two, prediction in image space and warping around the predicted points, is exactly the goal of this project.


Input to the system : frames of pretty images, head tracker data till time (i-1)
Output : frame for i'th frame
The intended system will work in the following way.

My Implementation

All three parts of the system appear to be computationally intensive. Correspondence and optical flow computation is known to be a non-trivial problem. Using hundreds of Kalman filters to predict screen position of each feature in next frame is also likely to slow down the operation. Image warping is a per-pixel operation is very expensive.

However, we can use various ideas to break down these parts into more manageable subparts. I used OpenGL on SGI to implement all three parts.

Optical Flow

Image warping

Kalman Filter

Results and images