Machine Learning

Deb K. Roy

Assistant Professor of Media Arts and Sciences

MIT, Media Lab

*Description*

Machine learning is the ability of a machine to improve its performance based on previous results. Algorithms and techniques already known, e.g. perceptrons, boosting, Kalman filtering, support vector machines, hidden Markov models, Bayesian networks, etc. lead to applications such as pattern recognition, information retrieval, classification, behavior modeling. How and why do these methods work? What algorithm to use? e.g. when to use statistical inference techniques? How to structure the data that the algorithm learns from?

Limitation: this is my contextual area and should not be about technology details, but should give me the intuition for what method to use, where, how, and why.

*Written Requirement*

The written requirement for this area will consist of a 24-hour take-home exam
to be evaluated by Professor Deb K. Roy.

*Reading List*

Books:

M. Jordan and C. Bishop, Introduction to Graphical Models, MIT (Internal Manuscript).

M. Jordan, Learning in Graphical Models, MIT Press, 1998.

T. Mitchell, Machine Learning, McGraw Hill, 1997.

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Series in Statistics), Springer-Verlag, October 2001.

C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, January 1996.

R. Duda, P. Hart, D. Stork, Pattern Classification, John Wiley & Sons, 2000.

F.V. Jensen, An Introduction to Bayesian Networks, Springer-Verlag, London, 1996.

N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, August 2000.

T. Cover, J. Thomas, Elements of Information Theory, Wiley-Interscience, August 1991.

Theses:

T. Jebara, Discriminative, Generative and Imitative Learning, PhD Dissertation, MIT, Media Laboratory, December 2001.

S. Basu, Conversation Scene Analysis, PhD Dissertation, MIT, EE/CS Department, September 2002.

B. Schoner, Probabilistic Characterization and Synthesis of Complex Driven Systems, PhD Dissertation, Massachusetts Institute of Technology, Media Laboratory, September 2000.

S.D. Whitehead, Reinforcement Learning for the Adaptive Control of Perception and Action, PhD Dissertation, University of Rochester, Computer Science Dept., 1992.

Relevant publications:

A.K. Jain, M.N. Murty, P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, 31: (3) 264-323 September 1999.

A.K. Jain, R.P.W. Duin, J.C. Mao, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22: (1) 4-37, January 2000.

L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of IEEE, Vol 7, No. 2, February 1989.

C.J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2: (2) 121-167 June 1998.

A.K. Jain, J.C. Mao, K.M. Moniuddin, Artificial Neural Networks: A Tutorial, IEEE Computer Special Issue on Neural Computing, March 1996.

R.S. Sutton, Learning to Predict by the Method of Temporal Differences, Machine Learning, 3: 9-44, 1998.

D. Heckerman, A Tutorial on Learning With Bayesian Networks, Technical Report MSR-TR-95-06, Microsoft Research, Redmand, Washington, 1995. Revised June 96.