![]() |
The architecture of the monitoring system, described in this paper is shown in figure 1. The system consists of three components. The tracker processes the data from a camera and identifies moving objects in the camera view. The objects are tracked and the data about their movement are collected into partial tracks.
The partial tracks are then passed to the event generator, which generates discrete events for the beginning and the end of each track according to a simple environment model. This model encodes the knowledge about the environment and can be learned. The map helps the generator to differentiate between tracks that correspond to objects entering and leaving the scene from objects which are lost by the tracker due to occlusion.
The parser analyzes the events according to the grammar which structurally describes possible activities. The grammar represents the knowledge about structure of possible interactions, making it possible to enforce structural and contextual constraints. We presently describe each component in detail.