Using Virtual Game Context
Suppose a foreign student of English observed many natural interactions between a waitress and a customer in a restaurant. How much would (s)he be able to learn about English, starting from scratch? If the student could watch individual scenes over and over again, and search them for particular features, you can probably imagine that (s)he would learn a tremendous amount, for example: the meanings of the nouns that refer to objects in the restaurant, verbs that refer to typical restaurant activities, personal pronouns, ways of formulating requests, etc. Could a computer do this too?
In my current project, I am trying to model aspects of how people might use non-linguistic context to learn a language. My primary source of data is The Restaurant Game which was developed by Jeff Orkin at MIT Media Lab. This is a two-player game in which people are invited to play the roles of a customer and a waitress in a virtual restaurant. As they play the game we record everything they say (typing text as in instant-messaging), and everything they do (moving their character around and interacting with objects through a simple point-and-click interface). About 10,000 games have been collected this way, forming a unique corpus with many similar but not identical interactions that an AI system can learn from.
Even without analyzing the internal structure of utterances it is already possible to get a rough idea of when to say what just by looking at the context. A preliminary implementation of a planner that lets the game be played by AI characters follows more or less that strategy. (Check out its performance on Jeff's blog.)
There is a collection of objects that play an important role in the game and are used frequently, such as the different types of food and drinks, the bill and the menu, and to a lesser extent decorative items such as the vases of flowers on the tables. Knowing what words people use to talk about these items is a great starting point for analyzing the utterances in more detail and learning the meanings of the words and phrases contained in them. The technique we use to do this is based on the assumption that people are more likely to for example use the word 'wine' when they are doing something with wine, and less likely to use it when they are doing something unrelated. Using these kind of co-occurrence statistics we can build a vocabulary of words and phrases such as 'beer', 'bill', 'white wine', 'spaghetti marinara' and 'soup du jour' with the object categories that they refer to as their meanings.
Having this vocabulary the system knows what object people are talking about when they use one of these words or phrases. This gives it an anchor to figure out what the rest of the utterance means, for example by looking at what is being done with the object and who is doing it. Where it is not possible to work out all the details, it will still be very helpful to recognize constructional patterns with a particular pragmatic function such as 'May I have a/the X?', where X can be any object name.
The meanings of words and phrases as they are learned form The Restaurant Game will in many cases not be quite representative of their general meanings, because they have been learned in only one type of scenario. Fortunately, two more games are now collecting data in new scenarios. In Mars Escape (by the Personal Robots group) an astronaut and a robot have to complete a mission before oxygen supplies run out, and in Improviso (a collaboration with GAMBIT), players are asked to act out an improvised story involving an alien who landed on earth.
The project is funded by a Rubicon grant from the Netherlands Organisation for Scientific Research (NWO), project nr. 446-09-011.