next up previous
Next: Recognition and semantic segmentation Up: Experimental results Previous: Experimental results

Recognition and disambiguation

As an example of a simple gesture recognition and semantic disambiguation, we show a recognition of a short hand gesture which can take several possible forms. We define a ``SQUARE'' gesture as either left-handed (counterclockwise) or a right-handed (clockwise) gesture which consists of four parts - ``TOP'', ``BOTTOM'', ``LEFT SIDE'' and ``RIGHT SIDE''. In this formulation ``TOP'' and ``BOTTOM'', for example, are ambiguous because both of them can be formed by the same gesture. We note, however, that it can never happen in the same context. That is, if it is a right-handed square, ``TOP'' is a left-to-right movement and ``BOTTOM'' is a right-to-left one. In case of the left-handed square the definitions are reversed. We attempt to semantically disambiguate these definitions and recognize a ``SQUARE'' regardless of the fact that it can be either the right-handed or a left-handed square.

To describe this structure we use a simple grammar Gsquare: which reflects the ambiguity of the terminal meaning, with ``skip'' rules omitted for simplicity:

Gsquare:        
SQUARE $\rightarrow$   RH [0.5]
    | LH [0.5]
RH $\rightarrow$   TOP UD BOT DU [1.0]
LH $\rightarrow$   BOT DU TOP UD [1.0]
TOP $\rightarrow$   LR [0.5]
    | RL [0.5]
BOT $\rightarrow$   RL [0.5]
    | LR [0.5]
LR $\rightarrow$   left-right [1.0]
UD $\rightarrow$   up-down [1.0]
RL $\rightarrow$   right-left [1.0]
DU $\rightarrow$   down-up [1.0]

Figure 4: ``Square'' sequence segmentation.
a) right-handed square, b) left-handed square.
\begin{figure}
\small\begin{verbatim}a) Segmentation <rsquare.dat>:
Label Segmen...
...593422e-01
Viterbi probability = 0.01651770\end{verbatim}\normalsize\end{figure}

We receive input data from a ``Stive'' vision system [], shown in figure 5. The system uses stereo algorithms to determine x-y-z position of person's hands and head. At a frame rate of approximately 20 frames a second, the ``Square'' data set consists of 150 - 200 samples for each experiment.

For terminal recognition we trained four three-state HMMs on x and y velocities of 20 examples of each of the primitive hand movements. After achieving reasonable recognition rate, we performed several ``SQUARE'' gestures determining candidate events as described above. The results were passed to the parser yielding the results, presented in figure 4. The figure 4a shows that the semantic structure recovered was that of a right-hand square and the whole sequence was labeled as a ``SQUARE''. Recognition results for a left-handed square sequence are shown in figure 4b. Note that the left-right gesture was interpreted as ``TOP'' in the global context in the first case, and as ``BOTTOM'' in the second. The figures show how timing constraints propagated through the parse and formed continuous coverage of the input signal.


next up previous
Next: Recognition and semantic segmentation Up: Experimental results Previous: Experimental results
yuri ivanov
1999-02-06