In the Patrol scenario, tasks include aiming, shooting, and reloading. Other user actions such as standing, walking, running, and scanning the environment can be considered as tasks which may be executed simultaneously with the previous tasks. In this section we describe a computer vision system for the recognition of such user tasks. The system is based on a generic object recognition system recently proposed by Schiele and Crowley [13]. A major result of their work is that a statistical representation based on local object descriptors provides a reliable means for the representation and recognition of object appearances.
In the context of the Patrol data this system can be used for recognition of image patches that correspond to particular motions of a hand, the gun, a portion of an arm, or any part of the background. By feeding the calculated probabilities as feature vectors to a set of hidden Markov models (HMM's), it is possible to recognize different user tasks such as aiming and reloading. Preliminary results are described in the next section.