next up previous
Next: Event Generator Up: Monitoring System Previous: Monitoring System


Tracker

The tracker used in our system is based on a adaptive mixture of Gaussians technique described in detail in [16]. In this approach each pixel is modeled by a separate mixture of K Gaussians as follows:


\begin{displaymath}
P(X_t)=\sum^K_{i=1}\omega_{i,t}*\eta(X_t, \mu_{i,t}, \Sigma_{i,t})
\end{displaymath} (1)

where $\omega_{i,t}$ is an estimate of the ith mixture coefficient for time t, Xt is the current pixel value, and $\mu_{i, t}$ and $\Sigma_{i, t}$ are the parameters of the corresponding component.

If the current pixel value, Xt, is found to be well modeled by one of the mixture components (Xt is within 2.5 standard deviations from the mean), the weights $\omega(i, t)$, and parameters of the corresponding component are re-estimated. If the former is not true, the least likely component of the mixture is replaced by a new one, with the mean $\mu_(i, t)$ set to Xt and high initial variance, $\Sigma_{i, t}$.

The next step is to determine if the pixel Xt belongs to the background. In order to do that, we sort all the components in the mixture in the order of decreasing ratio $\omega / \left\vert \Sigma
\right\vert^2$. This ratio, effectively assigns higher importance to the mixture components that received the most evidence and have the lowest variance. The intuitive meaning of this ratio is that the components which correspond to background typically have more observations attributed to them and those observations vary little.

Then, after the components are sorted, we can set a threshold, T, which will separate components responsible for background pixels from the ones modeling foreground as follows:


\begin{displaymath}
B={argmin}_b\left({{\sum_{k=1}^{b}\omega_{k}}\over{\sum_{k=1}^{K} \\
\omega_{k}}}>{T}\right)
\end{displaymath} (2)

where the meaning of the value B is that the first B components of the sorted mixture are found ``responsible'' for background. Now, if the pixel Xt is best modeled by one of the ``background'' components, it is marked as belonging to the background.

Finally, foreground pixels are segmented into regions by a two-pass, connected components algorithm.

Establishing correspondence of foreground regions between frames is accomplished using a linearly predictive multiple hypotheses tracking algorithm which incorporates both region position and size. We have implemented an on-line method for seeding and maintaining sets of Kalman filters, modeling the dynamics of foreground regions. Details of this process can be found in [16]. Essentially, for each frame, the parameters of the existing dynamical models are estimated; those models are used to explain observed foreground regions, and, finally, new models are hypothesized based on foreground regions which were not explained by any existing model.

Our system adapts to robustly deal with lighting changes, repetitive motions of scene elements, tracking through cluttered regions, slow-moving objects, and introducing or removing objects from the scene. Slowly moving objects take longer to be incorporated into the background, because their color has a larger variance than the background. Also, repetitive variations are learned, and a model for the background distribution is generally maintained even if it is temporarily replaced by another distribution which leads to faster recovery when objects are removed.


next up previous
Next: Event Generator Up: Monitoring System Previous: Monitoring System
yuri ivanov
1999-02-05