MAS.S60: Practical Natural Language Processing

Catherine Havasi
Tuesdays and Thursdays from 1:00 to 2:30 PM in E14-525

Updates

Course description

Language is central to the way people interact with one another, and people often wish they could interact with computers in the same way. As computers increasingly pervade human lives and decision-making processes, understanding language written by humans becomes more critical, even in applications that would not typically rely on other forms of AI. Additionally, public triumphs of NLP such as Siri and Watson are beginning to change the public's perception of the interaction between computers and language.

This class will teach you the techniques that are the building blocks of NLP, and how to assemble them to accomplish goals such as improving human-computer interaction, augmenting human creativity, and enabling interactive storytelling.

This is a lab-based course involving a final project and paper. Bring a laptop to class.

With co-teacher Rob Speer.

Textbook

Natural Language Processing with Python

The textbook for this course is Natural Language Processing with Python, by Steven Bird, Ewan Klein, and Edward Loper, 2009. A version of the book is available online.

Syllabus

DateTopicsIn-class labAssignment
12/7/2012Intro to NLPInstall NLTK-
22/9/2012 Corpus lingusitics, N-grams, frequency distributions, Markov chainsBuild language models and generate text from themBuild a corpus, calculate information, generate text
32/14/2012Morphology, Automata, and Regular Expressionsy-i spelling changes-
42/16/2012 Tokenization, lemmatization, and text wrangling
Unicode in Python Demystified slides
Work with Twitter data find a trending topic in a language you don't know
2/21/2012No Class - It's a Monday--
52/23/2012 Naïve Bayes classification Write a naïve Bayes function
starter code - finished code - ipython log
-
62/28/2012Tagging / backoff strategiesTagging bake-off with NLTK-
73/1/2012Chunking (NLTK slides)Chunking bake-off with NLTKNamed entity recognition in Dutch
83/6/2012 Chart parsing / PCFGs (NLTK slides)-
93/8/2012HMMs and machine translation (guest lecture) / More about PCFGs--
103/13/2012Lexical Resources: WordNetFind topic similarities in WordNet-
113/15/2012LDA (guest lecture)--
123/20/2012Dependency parsing / ConceptNetFind topic similarities in ConceptNet-
133/22/2012 Word sense disambiguationIn-class lab on WSD-
3/27/2012No Class - Spring Break--
3/29/2012No Class - Spring Break--
144/3/2012 Data mining and WikipediaAcquire and work with Wikimedia dataKnowledge Extraction from Wikipedia
154/5/2012 NumPy and Recommender SystemsSimple MovieLens-based recommender-
164/10/2012Classification: Decision trees / information gainBuilding a decision tree lemmatizer-
174/12/2012Unification and grammatical featuresBuild a feature grammar-
4/17/2012No Class - Holiday--
184/19/2012 Sentiment analysis / classification with MaxEntRevisit movie review classification-
4/24/2012Guest Lecture or No Class - Sponsor Week--
4/26/2012Guest Lecture or No Class - Sponsor Week--
195/1/2012 Semantics I (additional slides) -
205/3/2012 Project previews / more semantics, briefly --
215/8/2012Semantics II-
225/10/2012Time and Space--
235/15/2012ConceptNet 5; some project presentations--
245/17/2012Project presentations--