4.889 Intelligent Interface Software Design Workshop
Assignment 3: Gesture interfaces

Objective

This assignment is designed to give you experience with a simple gesture recognizer. We'll ask you to use this in a project that incorporates gesture recognition into the interface.

Gesture recognition

Elaborate gesture recognition interfaces use visual or 3D location inputs to interpret movements or arms or hands or even entire bodies. For this assignment we'll be considerably more modest--the gestures will be simple paths sketched with the mouse. The we will use is taken from a program developed by Dean Rubine at CMU. The attached papers by Rubine describe how the program works, together with some extensions he has proposed, and some potential applications of his system. You may wish to implement and investigate some of these in your project.

Task 1: Play with the classifier

We've extended to incorporate Rubine's gesture recognizer. To load this new version, look in the Mondrian folder for a folder named Gesture, and load the file Load-Gesture that you will find there. If Mondrian doesn't start up automatically after loading this file, start it by hand by evaluating the expression (a `mondrian-application).

Once Mondrian starts up, you will see two new command dominoes labeled Teach Gesture and Recognize Gesture. Select Recognize Gesture and then draw a rough square with the mouse: the gesture starts when you press the mouse button, and it continues for as long as you hold the button down. This demonstration system has been trained to recognize three types of gestures: circles, squares, and lines. When the system recognizes one of these gestures, it responds by drawing a cleaned-up version of the gesture, then erased it after a brief wait. If the system does not recognize the gesture, it prints gesture unrecognized. Play around with a few gestures to get a feel for the limitations of the system's recognition capabilities. You should realize that the limitations arise not only from the basic recognition algorithm, but also from the way in which the system was trained. (It was trained by Hal in a rather ad hoc manner.)

You can use Teach Gesture to train the system to recognize a new class of gestures. Click on the Teach Gesture domino and draw the first example of the gesture. The system will put up a menu asking you for the name of the gesture class. Keep giving the system examples until you think you have enough. Then use Recognize to see how well the system distinguishes this new class from squares, circles, and lines. When the system recognizes your new gesture, it will print the name of the gesture. In this simple interface, there is no convenient way for you to specify an action that the system should take in response to recognition, so you'll have to be content with it just printing the name. You can use Teach Gesture to add more examples of your new gesture, and also more examples of squares, circles, and lines, if you like. Be careful, though, because the interface provides no way to retract a bad example--and if examples, are poorly chosen, the recognizer will not do a good job.

You can define as many gesture classes as you like, but the more classes you have, and the more similar they are, the more likely the classifier is to become confused and produce ambiguous or incorrect responses.

Task 2: Add a new response

Once you've trained the system to recognize a new gesture class, add some behavior that the system should perform when it recognizes the class. For example, you might train it to recognize an "L-shape" as a new class, and then have it draw a cleaned-up version of the L in response to seeing an L-shape.

You can see how this is done by examining the procedures respond-to-square, respond-to-circle, and respond-to-line in the file named Test. These procedures are called with a collection of attributes--geometric information that is computed by the gesture recognizer and used in the classification algorithm. For example, the attributes include the coordinates of the first point and the last point in the gesture. The procedure respond-to-line procedure creates a "cleaned up" gesture by simply joining the first and last points by a straight line. Then it shows the line, waits a second, and erases the line. Respond-to-square is similar, using the maximum and minimum x and y (which are contained among the gesture attributes) to compute an ideal square that matches the gesture size and position.

The complete list of attributes that are computed for each gesture can be found in the definition of the gest-attributes structure, which is defined at the beginning of the file features.

Once you have defined your respond-to-... procedure, you associate it with the name of the class by changing the list named *class-responses*. You can setq this list by hand, or edit the procedure install-gesture-commands (found in the file named Gesture-command).

A note on the code

Even though Rubine's system has been interfaced to Mondrian, the files are structured in such a way that you can use the basic recognizer in your own applications, independent of Mondrian. The file Gesture-command contains most of the places where Mondrian interfaces to the underlying recognizer. You can examine this to see how the recognition code is called.

The basic data structure is something called a classifier. This holds information about the gesture classes, and is the thing that "does" the learning and the recognition. Our system uses a single classifier called *gesture-classifier*. You might consider designing a system that has multiple classifiers for different purposes.

To train the classifier, you call the procedure gest-add-example, with the (vector of) points that form the gesture, the name of the class for which this is an example, and the actual classifier. Look at the procedures in test that accomplish these calls. To recognize a gesture, you can use the procedure gest-classify, or, for a somewhat higher-level interface, the procedure recognize-class. Gest-classify returns (via a multiple-value return) four pieces of information: the name of the class (or nil if the gesture was not recognized), the attributes computed for the gesture, the probability that the gesture was unambiguously recognized, and the distance (in feature space) of the given gesture from the average gesture in the class. These last two correspond to parameters that can be set (as at the beginning of test) to control the tolerance with which the system is willing to recognize a gesture.

The actual points to be passed to the classifier and recognizer are collected by a new kind of mouse tracker called a suit-gesture-tracker. You can see how this in implemented in the file Gesture-Tracker. The tracker basically just collects the gesture points so that they can be passed to the classifier.

Task 3: Design a gesture application

Design an application that uses gestures. For example, change the interface to Mondrian to use gestures, either in whole or in part. Or design a simple draw program that uses gestures. Or think about gestures in some other application. Rubine's papers also discuss some interesting ideas, such as using "eager" recognition to combine gestures with direct manipulation. Another possibility is to play with the low-level recognizer itself and change the features that it uses for recognition. You can find other suggestions in the reading list for the course. In particular, take a look at Buxton's book The Pragmatics of Haptic Input.

In class, you should give a five-minute description of your proposed application.

As a project for the course, you should implement your proposal and demonstrate it in class.

lieber@media.mit.edu