BABYL OPTIONS: -*- rmail -*- Version: 5 Labels: Note: This is the header of an rmail file. Note: If you are seeing it in rmail, Note: it means the file has no messages in it. 1, filed,, Summary-line: 26-Oct thomaz@media.mit.edu [237] #MAS965 - Project Paper Preview Received: from ml.media.mit.edu (ml.media.mit.edu [18.85.13.107]) by aleve.media.mit.edu (8.9.3/8.9.3/+ALEVE) with ESMTP id NAA23089 for ; Thu, 26 Oct 2000 13:29:20 -0400 (EDT) Received: from localhost (thomaz@localhost) by ml.media.mit.edu (8.8.8/8.8.7) with ESMTP id NAA14675 for ; Thu, 26 Oct 2000 13:29:19 -0400 (EDT) Date: Thu, 26 Oct 2000 13:29:19 -0400 (EDT) From: Edison Thomaz To: lieber@media.mit.edu Subject: MAS965 - Project Paper Preview Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: a4b8c4bd4b80dfd899c8283204b21a8e *** EOOH *** Date: Thu, 26 Oct 2000 13:29:19 -0400 (EDT) From: Edison Thomaz To: lieber@media.mit.edu Subject: MAS965 - Project Paper Preview Hi Henry, As requested, here's my project paper preview. Thanks, Thomaz ----------------------------------------------------------- MAS965 - Fall 2000 Edison Thomaz Jr. Tempo: A Context-Sensitive Search Engine Publication Preview 1 1. Abstract I describe the design, implementation and analysis of Tempo, a search engine that relies primarily on user context and user behaviour to classify and categorize web pages. I propose the utilization of context variables such as (1) time and (2) user system state information to select the web pages that are most likely to contain the content that users are looking for. 2. Introduction Without any doubt, one of the biggest problems of the Internet today is that due to the exponential growth of published information on the web in the past few years, it is extremely difficult to find information on the net. Not only are current search mechanism hard to use for novice and experienced users, but they are also ineffective and painfully frustrating most of the time. Several kinds of search engines have been developed recently to address this problem and with the exception of a very few, most of them fall short of their goals. I would like to present an approach to searching for information on the Internet that relies on user context and is significantly novel and unique. 3. Method Instead of a seach engine that analyzes the lexical and syntactical structure of pages and their contents, I propose the creation of a search system that recommends web pages to users according to a temporal and contextual analysis of how previous users reacted to the results of similar search queries. In other words, the search mechanism relies primarily on how much time users spent looking for information in web pages in the past to make link recommendations for future users. In addition to that, other contextual cues such as how the user navigated from one search result page to another and the state of the users computer during the search cycle can help determine which pages the user found interesting and useful. The fundamental idea behind this search mechanism is to use time and user behaviour as the context for web page classification and recommendation. 4. Implementation The entire search engine consists on 4 pieces: a stand-alone application, an application server, a database and a conventional search engine that makes recommendations according to the associativity of web pages or keywords such as Google, for example. 4.1 Stand-Alone Application The stand-alone application is the visual cortex of the system. It is completely independent of web browsers and contains the edit field where users will effectively type their search queries. The app has to be completely browser independent because they need to track the amount of time that users spend going from one web page to another. The search results might be displayed in the web browser or in a list box within the stand alone app itself. I plan to write this app as a native Macintosh application. 4.2 Application Server The application server communicates with the stand-alone app by means of an XML-derived language, possibly SOAP, and is basically the middle-man between the database and the app. I am investigating the possibility of using an open-source application server called Enhydra as this component of the system. 4.3 Database The database is the key component of the system because it will contain the associations between the keywords typed by previous users of the system and the web pages where previous users theoretically found what they were looking for. There is an open-source RDBMS called InstantDB that looks very promising for such a task. I have used it successfully in the past. 4.4 Conventional Search Engine The conventional search engine is important because throughout the lifetime of the system, users will always make new queries that are not associated with any previous query. As a result of that, the database of the system will not be able to return any contextual information of where previous users found the information that they were looking for. So, with a conventional search engine, the system can return conventional search results to the users under these scenarios and then make new context-related inferences about the user search and then gather the necessary information for future inquiries. I would like to use Google as this component of the system. 5. Discussion If a user is looking for some piece of information X and we associate a time-line with the entire search process, at time t=0s, the user finishes typing his or her query and presses the Search button. At the very first run, the system database will be empty, so the system returns results provided by the conventional search engine, in this example scenario, Google. At time t=10s, lets say, the user clicks on the first result returned by Google. At this time, the stand alone application starts counting how many seconds the user spends in the first result returned by Google and stores that in memory. After 10 seconds, at t=20s lets say, the user selects the second link returned by Google and spends 15 seconds in the web page associated with this link. The stand alone app, still tracking the behaviour of the user, now knows that the user spent 10 seconds in the first web page returned by Google and 15 seconds in the second page returned by Google. If the user selects the third web page returned by Google at t=35s and spends 5 minutes there, then the stand alone app, realizing that the third page attracted the users attention for a much longer period of time, infers that the third web page is much closer in providing the information that the user is looking for than the first two visited web pages. The stand alone app then writes that inference into the database by storing the amounts of time that the user spent visiting each page and by giving the third page much higher priority than the first two pages. It it important to mention that we cannot guarantee that the third page presented by Google is the one that has exactly what the user was originally looking for, but it is clear that it drew the users attention long enough to be considered relevant in one way or another. The next time a user searches for X, the system will already have some reference pages associated with such a keyword and will be able to return a much more precise set of links to the user, as opposed to returning the results of Google. This set of links will be derived by making a calculation, determined empirically, according to the amount of time that previous users spent on each one of the associated pages in the database. In this example case, the third page , the one that attracted the attention of the first user for the longest period of time, would be most likely shown to the second user at the top of the list. 6. Conclusions and Future Work I strongly believe that we can improve the efficiency and effectiveness of Internet searches greatly if we take into consideration the overall context of users at the exact moment that they look for information. As far as improvements for the search engine are concerned, I believe that its accuracy can be significantly enhanced by taking into account more contextual degrees of freedom. By doing that it can learn more about the user and his or her search needs implicity, without any external intervention. For instance, it can develop a better undertanding of what the user is trying to accomplish by studying how mouse movements relate to search keywords and navigation patterns. The search system can also study the current state of the users computer and check which processes are often running when certain searches are performed. An example of this would be returning lots of multimedia-related links as search results if the user is running some audio or video player in the background. In this case, the fact that the user is watching videos or listening to audio might be an evidence that the user is looking for a particular type of content, perhaps multimedia files. If the user is running a program like Photoshop in the background, it is more likely that the user is looking for clip art or stock photography when performing a search. By utilizing statistical algorithms when studying the data collected by the stand-alone application, the search engine might be able to make several useful inferences about the user and his or her goals. 7. Reference 1. Bates, M.J. The design of browsing and berrypicking techniques for the online search interface. Online Review 13, 5 (October 1989), 407-24. 2. Koenemann, J., and Belkin, N. A case for interaction: a study of interactive information retrieval behavior and effectiveness, in Proceedings of CHI '96 (Vancouver Canada, May 1996), ACM Press, 205-12. 3. Baldonado, Q., and Winograd, T. SenseMaker: An Information-Exploration Interface Supporting the Contextual Evolution of a User's Interests, in Proceedings of CHI97 (Atlanta USA, March 1997),