MAS 632 Conversational Computer Systems

Fall 2008

Chris Schmandt
E15-368a
617-253-5156
geek@media.mit.edu

Course secretary: Kristin Hall, kristin@media.mit.edu
See Kristin for all missed handouts (although we will try to avoid using paper).

course mailing list

The course mailing list is mas632@media.mit.edu.
It is assumed that all students are on this list.

course text

Schmandt, Conversational Computer Systems
This book may be found here. All my publications are found on the Speech + Mobility Group's web pages.

The syllabus of this class will be approximately the table of contents of the book. We will add more recent case studies from the literature and have more to say about topics in spatialized audio, speech skimming, non-speech audio, and computer mediated voice communication.
Rather than a series of formal lectures, this course will be based on your reading for the week and asking questions / discussing in class. Thus it is difficult to predict an exact week-by-week syllabus in advance. This document will evolve during the term to reflect our actual rate of progress. During the last third of the semester, we will optionally cover additional topics in non-speech audio and user interfaces for mobile devices.

grades and policies

There will be 8 to 10 problem sets, which will be graded on a scale of 1 (awful) to 5 (great). There will be no exams or projects. Grades will be based on problem set performance, with a half-grade factor for class participation, including doing the readings.

Problem sets are due in class on the due date, and will be accepted only in hard copy (unless prior arrangements have been made). Problem sets will be accepted up until the day they are discussed in class, but submissions after the due date will incur a 1/2 point penalty. (penalty waived if the instructor has more than two ungraded problem sets not yet returned). You can use whatever word processor or text formatter you want, but except in special cases hand written problem sets are not acceptable. Figures (flow charts, box drawings, etc.) may be hand drawn and labelled.

4 Sept 2008 - introduction and genres of conversation

handouts:
  1. Problem set 1, due 11 Sept.

reading assignment for 11 Sept.:
  • Chapanis, A. Interactive Human Communication. Scientific American, 232, (1975) pp. 36-42

  • Chalfonte B.L., Fish R.S., Kraut R.E.; Expressive Richness: A Comparison Of Speech and Text As Media For Revision

  • Isaacs E.A., Tang J.C.; What Video Can and Cannot Do For Collaboration: A Case Study

  • Schmandt C.; Voice Communication With Computers - Chapter 1

  • 11 Sept 2008 - components of conversation and speech production

    look at these: wide band and narrow band spectrograms
    handouts:
    1. Problem set 2, due 18 Sept.
    reading assignment for 16 Sept:
  • Arons, B. Techniques, Peception and Applications of Time-Compressed Speech , Proceedings, 1992 Avios Conference

  • Schmandt C; Voice Communication With Computers - Chapter 2

  • 16 Sept 2008 - Speech production and hearing

    play with this: Sagittal Section of the Head
    look at this: Quileute alphabet
    handouts:
    1. Problem Set 3, due 23 Sept

    reading assignment for 23 Sept:
  • Schmandt C; Voice Communication With Computers - Chapter 3 (this is longer than the previous chapters - read with care)

  • 23 Sept 2008 - Speech coding

    1. Problem set 4, due 30 Sept
    reading assignment for 30 Sept:
  • Arons, B. A review of the cocktail party effect, Journal of the American Voice I/O Society, July 1992

  • Mullins, M. and Schmandt, C. AudioStreamer: exploring simultaneity for listening, CHI 1995

  • Kobayashi, M, and Schmandt C. Dynamic Soundscape: mapping time to space for audio browsing. CHI 1997

  • Schmandt C; Voice Communication With Computers - Chapter 4, skim
  • 30 Sept 2008 - Accessing recorded speech

    handouts: reading assignment for 7 Oct.:
  • Spiegel, M. Difficulties with Names. Speech Technology Magazine June 2003

  • Pisoni D.B., Nusbaum H.C. & Greene B.G. (1985). Perception of synthetic speech generated by rule. Proceedings of the 1EEE 73:1665-1676

  • J. Lai, D. Wood, and M. Considine, The effect of task conditions on the comprehensibility of synthetic speech. CHI 2000

  • L. Gong and J. Lai, Shall we mix synthetic speech and human speech?: impact on users' performance, perception, and attitude. CHI 2001

  • J. Lai, K. Cheng, P. Green and O. Tshimhoni, On the road and on the Web?: comprehension of synthetic and human speech while driving. CHI 2001

  • Schmandt C; Voice Communication With Computers - Chapters 5 and 6
  • 7 Oct 2008 - Speech synthesis

    Listen to the various speech synthesis samples .
    reading assignment for 14 Oct.:
  • Schmandt C; Voice Communication With Computers - Chapter 7.

  • 14 Oct 2008 - review of problem sets and speech recognition intro

    reading assignment for 28 Oct.
  • Rudnicky, A and Hauptman, A. Models for evaluating interaction protocols in speech recognition. Proceedings of CHI 1991, pp 285-291

  • Suhm, B., et. al. A comparative study of speech in the call center: natural language call routing vs. touch-tone menus. Proceedings of CHI 2002 pp. 283-290

  • What can I say? Evaluating a spoken language interface to email. Walker, M., et al. Proceedings of CHI 1998, pp. 582-589

  • Designing SpeechActs: issues in speech user interfaces. Yanelovich, N., et al. Proceedings of CHI 1995 pp. 369-376

  • MailCall: message presentation and navigation in a nonvisual environment. Marx, M. and Schmandt, C. Proceedings of CHI 1996, pp. 165 - 172

  • Schmandt C; Voice Communication With Computers - Chapter 8.

  • 21 Oct 2008 - No class



    28 Oct 2008 Hidden Markov Models


    reading assignment for 5 Nov:
  • Vemuri, S., et al. (2003). Improving speech playback using time-compression and speech recognition. Proceedings of CHI 2004 pp. 295-302

  • A. Ranjan et al. Searching in audio: the utility of transcripts, dichotic presentation, and time compression, CHI 2006

  • Whittaker, S., et al. SCANMail: a voicemail interface that makes speech browsable, readable, and searchable. Proceedings of CHI 2003, pp. 275-282

  • Vemuri, S., and Bender, W. Next-generation personal memory aids. BT Technology Journal, October 2004, pp. 125-138

  • Tucker, S. and Whitaker, S. Time is of the essence: an evaluation of temporal compression algorithms, CHI 2006

  • 5 Nov 2008 Applications of Recognition: retrieval in voice docs

    reading assignment for 18 Nov:
  • Schmandt C; Voice Communication With Computers - Chapter 9.
  • Duncan, S. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology Vol 23, No. 2, pp 283-292, 1972

  • Erving Goffman, Forms of Talk, Chapter 1.

  • Grosz, B and Sidner, C. Attention, intentions, and the structure of discourse. Computational Linguistics, Vol. 12, No. 3, July-Sept. 1986 pp. 176-204

  • Clark, H. and Brennan, S. Grounding in Communication. chapter in Perspectives on Socially Shared Cognition, Resnick, L., Levine, J. and Teasley, S. eds.

  • 18 Nov 2008 Discourse

    25 Nov 2008 Identity, Community, and Participation

    Papers to read for this class:

    Week of Dec 1 2008 Presence & Being There

    Papers to read for this class:

    Week of Dec 7 2008 Managing Interruption

    Papers to read for this class:

    Note: ignore what follows this line

    20 Nov 2008 Non-speech audio and auditory "display"

    27 Nov 2008 No class, Thanksgiving

    4 Dec 2008 Probably no class (may be rescheduled)

  • Arons, B. SpeechSkimmer: A System for Interactively Skimming Recorded Speeech, ACM Transactions on Computer Human Interaction, March 1997

  • Roy, D. and Schmandt, C. NewsComm: a hand-held interface for interactive acces to structured audio proceedings, CHI 1996

  • 11 Dec 2008 Physical interaction in handheld devices

    Yes, I know the last official day of class is 10 Dec.
    Extra readings: