As computer vision begins to address the visual interpretation of action  applications such as surveillance and monitoring are becoming more relevant. Similarly, recent work in intelligent environments and perceptual user interfaces [2,4] involve vision systems which interpret the pose or gesture of users in a known, indoor environment. In all of these situations the first fundamental problem encountered is the extraction of the image region corresponding to the person or persons in the room.
Previous attempts at segmenting people from a known background have taken one of three approaches. Most common is a some form of background subtraction. For example,  uses statistical texture properties of the background observed over extended period of time to construct a model of the background, and use this model to decide which pixels in an input image do not fall into the background class. The fundamental assumption of the algorithm is that the background is static in all respects: geometry, reflectance, and illumination.
The second class of approach is based upon image motion only presuming that the background is stationary or at most slowly varying, but that the person is moving. .
In these methods no detailed model of the background is required. Of course, these methods are only appropriate for the direct interpretation of motion; if person stops moving, no signal remains to be processed. This method also requires constant or slowly varying geometry, reflectance, and illumination.
The final approach, and the one most related to the technique presented in this paper, is based upon geometry. Kanade, et al.  employ special purpose multi-baseline stereo hardware to compute dense depth maps in real-time. Provided with a background disparity value, the algorithm can perform real-time depth segmentation or ``z-keying'' . The only assumption of the algorithm is that the geometry of the background does not vary. However, the computational burden of computing dense, robust, real-time stereo maps, requires great computational power.
In this paper we present a fast, simple method for segmenting people from a geometrically static background. Using two or more cameras and background disparity maps created off-line, segmentation is performed by checking color intensity values at corresponding pixels: if the values match, the background disparity is validated and the pixel in the key image is assumed to belong to the background. Otherwise, it is labeled as object. Because the basis of comparison is background disparity warp between two images taken at the same time, illumination or reflectance can vary without significantly affecting the results.
In the remainder of this paper we describe the algorithm in detail, present some experimental results, and discuss implementation details critical to the performance of the technique.