hugo :: research
"only as an æsthetic phenomenon is
existence and the world justified"

- nietzsche

 

             
 
       
               
 
   
emotus ponens picture

MontyLingua V.2.1 (Python and Java)
A Free, Commonsense-Enriched Natural Language Understander for English

 
 
<Overview> <Terms of Use> <Download!> <Documentation> <Research and Industry Applications which use MontyLingua>
 
 
 



See enlarged screenshot of MontyLingua at work!

Recent bugfixes

Version 2.1 (6 Aug 2004)
- includes new MontyNLGenerator component generates sentences and summaries

Version 2.0.1
- fixes API bug in version 2.0 which prevents java api from being callable

What is MontyLingua? [top]

MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. MontyLingua makes traditionally difficult language processing tasks trivial!

Version 2.0 is substantially FASTER, MORE ACCURATE, and MORE RELIABLE than version 1.3.1. It has now been tested across Windows, many flavors of UNIX, and Mac OS X, and several flavors of Java, and is in use by several university research projects and under several commercial settings.

MontyLingua differs from other natural language processing tools because:

  • it is complete end-to-end.. input raw_text; output semantic interpretation
  • not many dated tools and implementations sewn together; it is one well-integrated implementation
  • it does not require "training" and other fidgetting, and will work right out-of-the-box
  • it is enriched with "common sense" knowledge about the everyday world, allowing it to escape many stupid interpretive mistakes. e.g.:
    • "(NX the/DT mosquito/NN bit/NN NX) (NX the/DT boy/NN NX)" ==corrected==>
    • "(NX the/DT mosquito/NN NX) (VX bit/VBD VX) (NX the/DT boy/NN NX)"
  • it is lightweight and portable across platforms, written in portable Python and also available as a compiled Java library
  • it is easy to customize by allowing for a user lexicon

MontyLingua performs the following tasks over text:

  1. MontyTokenizer - Tokenizes raw English text (sensitive to abbreviations), and resolve contractions, e.g. "you're" ==> "you are"
  2. MontyTagger - Part-of-speech tagging based on Brill94, enriched with common sense.
  3. MontyChunker - Lightning fast regular expression chunker
  4. MontyExtractor - Extracts phrases and subject/verb/object triplets from sentences
  5. MontyLemmatiser - Strips inflectional morphology, i.e. changes verbs to infinitive form and nouns to singular form
  6. MontyNLGenerator - Uses MontyLingua's concise predicate-arg representation to generate naturalistic English sentences and text summaries

* free for non-commercial use. please see MontyLingua Version 2.0 License


Terms of Use [top]

Author: Hugo Liu <hugo@media.mit.edu>
Project Page: <http://web.media.mit.edu/~hugo/montylingua/>

Terms of Use

Copyright (c) 2002-2004 by Hugo Liu, MIT Media Lab
All rights reserved.

Non-commercial use is free, as provided in the MontyLingua version 2.0 License. By downloading and using MontyLingua, you agree to abide by the additional copyright and licensing information in "license.txt", included in this distribution.

If you use this software in your research, please acknowledge MontyLingua and its author, and link to back to the project page http://web.media.mit.edu/~hugo/montylingua.

Please cite montylingua in academic publications as:

Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua.

Documentation [top]

Documentation and License
python documentation and api (html) [.html]
java documentation and api [.html]

MontyLingua license [.txt]
by downloading and using MontyLingua you must agree to these terms

Version 2.1 (6 Aug 2004)
- includes new MontyNLGenerator component generates sentences and summaries

Version 2.0.1
- fixes API bug in version 2.0 which prevents java api from being callable

New in version 2.0 (29 Jul 2004)

  • 2.5X speed enhancement for whole system, 2X speed enhancement for tagger component
  • rule-based chunker replaced with much faster and more accurate regular expression chunker
  • common sense added to MontyTagger component improves word-level tagger accuracy to 97%
  • updated and expanded lexicon for English
  • added a user-customizable lexicon CUSTOMLEXICON.MDF
  • improvements to MontyLemmatiser incorporating exception cases
  • html documentation added
  • speed optimizations to all code
  • improvements made to semantic extraction
  • expanded Java API

Download MontyLingua [top]

Please read the following information to proceed to the download of Version 2.1 for Java and Python.

If you have read and agree to the terms of use, click below to continue to the download
(your IP address will also be recorded):

(Download is a 12 MB zip file)

READ THIS if you are running ML on Mac OS X, or Unix

  • The distribution ZIP includes datafiles designed for windows. If you are running MontyLingua on Unix or Mac OS X, and the phrase "I love you" is tagged incorrectly, then the datafiles need to be rebuilt. This is simple:
    1. delete all files of the form, FASTLEXICON_n.MDF, where n is a number.
    2. re-run the MontyLingua program, either from Python, or Java, and the correct datafiles will be rebuilt. If running Java and you run out of memory during the rebuild process, use the -MX or -Xmx option in Java to increase the memory size. You will only need to rebuild these datafiles once.

Research and Industry Applications which use MontyLingua [top]
These are some of the research and industry projects which use MontyLingua and MontyTagger. To submit your project, email a web url and short description to the author.

William W. Cohen (2004) Minorthird: Methods for Identifying Names and Ontological Relations in Text using Heuristics for Inducing Regularities from Data, http://minorthird.sourceforge.net (website)

Jacob Eisenstein and Randall Davis. Visual and Linguistic Information in Gesture Classification. Accepted to International Conference on Multimodal Interfaces (ICMI'04) (paper)

L. Xie, L. Kennedy, S.-F. Chang, A. Divakaran, H. Sun, C.-Y. Lin (2004). "Discovering Meaningful Multimedia Patterns with Audio-visual Concepts and Associated Text." IEEE International Conference on Image Processing (ICIP 2004), Singapore, October 2004. (paper)

Ashwani Kumar, Sharad C. Sundararajan, Henry Lieberman (2004). Common Sense Investing: Bridging the Gap Between Expert and Novice. Conference on Human Factors in Computing Systems (CHI 04), Vienna, Austria. (paper) (website)

Hugo Liu and Push Singh (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal, upcoming. Kluwer Academic Publishers. (website)

Google for MontyLingua and MontyTagger to see who else has been using this software.



 

                                                                           

H U G O . . L I U ...
POSTDOCTORAL ASSOCIATE

program in comparative media studies, mit

the media laboratory, mit
if you like my work, please link to me
hugo at media dot mit dot edu