Modeling Camera Motion

A Project Proposal for MAS864
John Watlington

In the process of integrating the output of one or more moving imaging sensors to build a three dimensional model of the scene being imaged, accurate knowledge of the position, orientation, and optical characteristics of the cameras are required -- or at least highly desirable! This project will attempt to solve for the camera translation and rotation over time, given video input of scenes that contain a small number of man-made (or planar) surfaces.

Why This ?

I believe that structured video has unlimited possibilities. Unfortunately, they are hard to explore without video analysis systems which allow the simple creation of scene and actor models. As we attempt to move beyond "blue-screening" and laborious hand creation of set databases, we need accurate automatic tracking of camera motion. Even when fixed camera positions are attempted, the cameras inevitably move (due to jostling, touching for controls, and movement of tower/tripod) over time, requiring that camera calibration be applied at the beginning of each sequence.

Previous Work

There are two common approaches to determining the camera position from an image sequence. One approach is generally based on mapping the "optical flow" of the image sequence to a model of camera motion [Horn88, Adiv85 ]. This approach faces several difficulties: the "apperture problem" arising in local areas of the image lacking high frequency detail in a particular dimension, and problems from working with a 2D projection of a complex 3D world in which objects have transparency and reflective surfaces [Barron94].

The other approach is to correspond a number of points in two adjacent images, and then use their positions in the image plane to solve for the camera motion. Unfortunately, the image plane position is related to the world coordinates in a nonlinear manner. There have been several numerical techniques proposed for directly solving for the camera position [Horn90 ] from the location of N or more points. The use of additional points (overdetermining the solution) is recommended to minimize the effects of noise on the resultant motion vectors. A good illustration of camera position and parameter determination using a feature tracking technique is given by Shawn Becker's Doctoral work.

The approach I currently believe I want to explore is that of using Extended Kalman Filtering to track the corresponding features over time [ Broida90, Azarbayejani94 ]. In particular, the approach taken by Azarbayejani and Pentland also allows the estimation of focal length, another camera parameter that typically changes over time.

Approach

Unlike the implementation by Azarbayejani, I hope to have automatic detection and selection of the feature points used for motion tracking. Automatic rejection of feature point candidates that lie on objects which have motion relative to the global norm will have to be provided. In addition, no discussion is made in the literature of the need to add and remove feature points from the set being tracked as they are occluded or move out of the field of view of one of the cameras. This must be addressed.

My current plans are to pre-process the image data by performing edge detection. Candidate points for correspondence/tracking will be selected for :

high frequency content in both image dimensions (ie. intersections of orthogonal lines)

location within the frame (fair sampling of the entire image plane, especially the boundaries)

location in the scene (objects closer to the camera provide more accurate information than distant objects.)

It is unclear to me at this point what the effects of lens distortion will be upon the motion prediction (a constant error bias ?) Pre-correcting for such distortion is possible, if a camera/optics specific correction parameters are calculated (only needs to be done when the camera/lens or focal length is changed.)

One of the ultimate goals is to incorporate the motion estimate produced by this method with one provided by actually instrumenting the camera to sense motion (well, acceleration at least...) The sensor information could also be used in the afore-mentioned non-stationary feature point rejection.

References

Gilad Adiv, "Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. PAMI-7, No. 4, July 1985.

A. Azarbayejani and A. Pentland, "Recursive Estimation of Motion, Structure, and Focal Length", IEEE Trans. Pattern Analysis and Machine Intelligence, 1995?. Also MIT Media Lab Vision & Modeling Group TR#243.

J.L. Barron, D.J. Fleet, S.S. Beauchemin, "Systems and Experiment: Performance of Optical Flow Techniques", International Journal of Computer Vision, Vol. 12, No. 1, pp. 43-77, 1994.

T.J. Broida, S. Chandrashekhar, R. Chellappa, "Recursive 3-D Motion Estimation from a Monocular Image Sequence", IEEE Transactions on Aerospace and Electronic Systems, Vol. 26, No. 4, July 1990, pp. 639-656.

Berthold K.P. Horn,and E.J. Weldon, Jr., "Direct Methods for Recovering Motion," International Journal of Computer Vision, Vol. 2, No. 1, pp. 51-76, June 1988.

Berthold K.P. Horn, "Relative Orientation", International Journal of Computer Vision, Vol. 4, No. 1, pp. 59-78, 1990.

Questions? Comments? Send me mail:
wad@media.mit.edu
John Watlington, MIT Rm E15-351, 20 Ames St, Cambridge, MA 02139 617-253-5097.