MAS 632 Conversational Computer Systems

Fall 2008

Chris Schmandt
E15-368a
617-253-5156
geek@media.mit.edu

Course secretary: Kristin Hall, kristin@media.mit.edu
See Kristin for all missed handouts (although we will try to avoid using paper).

course mailing list

The course mailing list is mas632@media.mit.edu.
It is assumed that all students are on this list.

course text

Schmandt, Conversational Computer Systems
This book may be found here. All my publications are found on the Speech + Mobility Group's web pages.

The syllabus of this class will be approximately the table of contents of the book. We will add more recent case studies from the literature and have more to say about topics in spatialized audio, speech skimming, non-speech audio, and computer mediated voice communication.
Rather than a series of formal lectures, this course will be based on your reading for the week and asking questions / discussing in class. Thus it is difficult to predict an exact week-by-week syllabus in advance. This document will evolve during the term to reflect our actual rate of progress. During the last third of the semester, we will optionally cover additional topics in non-speech audio and user interfaces for mobile devices.

grades and policies

There will be 8 to 10 problem sets, which will be graded on a scale of 1 (awful) to 5 (great). There will be no exams or projects. Grades will be based on problem set performance, with a half-grade factor for class participation, including doing the readings.

Problem sets are due in class on the due date, and will be accepted only in hard copy (unless prior arrangements have been made). Problem sets will be accepted up until the day they are discussed in class, but submissions after the due date will incur a 1/2 point penalty. (penalty waived if the instructor has more than two ungraded problem sets not yet returned). You can use whatever word processor or text formatter you want, but except in special cases hand written problem sets are not acceptable. Figures (flow charts, box drawings, etc.) may be hand drawn and labelled.

4 Sept 2008 - introduction and genres of conversation

handouts:

Problem set 1, due 11 Sept.

reading assignment for 11 Sept.:

Chapanis, A. Interactive Human Communication. Scientific American, 232, (1975) pp. 36-42

Chalfonte B.L., Fish R.S., Kraut R.E.; Expressive Richness: A Comparison Of Speech and Text As Media For Revision

Isaacs E.A., Tang J.C.; What Video Can and Cannot Do For Collaboration: A Case Study

Schmandt C.; Voice Communication With Computers - Chapter 1

11 Sept 2008 - components of conversation and speech production

look at these: wide band and narrow band spectrograms
handouts:

Problem set 2, due 18 Sept.

reading assignment for 16 Sept:

Arons, B. Techniques, Peception and Applications of Time-Compressed Speech , Proceedings, 1992 Avios Conference

Schmandt C; Voice Communication With Computers - Chapter 2

16 Sept 2008 - Speech production and hearing

play with this: Sagittal Section of the Head
look at this: Quileute alphabet
handouts:

Problem Set 3, due 23 Sept

reading assignment for 23 Sept:

Schmandt C; Voice Communication With Computers - Chapter 3 (this is longer than the previous chapters - read with care)

23 Sept 2008 - Speech coding

Problem set 4, due 30 Sept

reading assignment for 30 Sept:

Arons, B. A review of the cocktail party effect, Journal of the American Voice I/O Society, July 1992

Mullins, M. and Schmandt, C. AudioStreamer: exploring simultaneity for listening, CHI 1995

Kobayashi, M, and Schmandt C. Dynamic Soundscape: mapping time to space for audio browsing. CHI 1997

Schmandt C; Voice Communication With Computers - Chapter 4, skim

30 Sept 2008 - Accessing recorded speech

handouts: reading assignment for 7 Oct.:

Spiegel, M. Difficulties with Names. Speech Technology Magazine June 2003

Pisoni D.B., Nusbaum H.C. & Greene B.G. (1985). Perception of synthetic speech generated by rule. Proceedings of the 1EEE 73:1665-1676

J. Lai, D. Wood, and M. Considine, The effect of task conditions on the comprehensibility of synthetic speech. CHI 2000

L. Gong and J. Lai, Shall we mix synthetic speech and human speech?: impact on users' performance, perception, and attitude. CHI 2001

J. Lai, K. Cheng, P. Green and O. Tshimhoni, On the road and on the Web?: comprehension of synthetic and human speech while driving. CHI 2001

Schmandt C; Voice Communication With Computers - Chapters 5 and 6

7 Oct 2008 - Speech synthesis

Listen to the various speech synthesis samples .
reading assignment for 14 Oct.:

Schmandt C; Voice Communication With Computers - Chapter 7.

14 Oct 2008 - review of problem sets and speech recognition intro

reading assignment for 28 Oct.

Rudnicky, A and Hauptman, A. Models for evaluating interaction protocols in speech recognition. Proceedings of CHI 1991, pp 285-291

Suhm, B., et. al. A comparative study of speech in the call center: natural language call routing vs. touch-tone menus. Proceedings of CHI 2002 pp. 283-290

What can I say? Evaluating a spoken language interface to email. Walker, M., et al. Proceedings of CHI 1998, pp. 582-589

Designing SpeechActs: issues in speech user interfaces. Yanelovich, N., et al. Proceedings of CHI 1995 pp. 369-376

MailCall: message presentation and navigation in a nonvisual environment. Marx, M. and Schmandt, C. Proceedings of CHI 1996, pp. 165 - 172

Schmandt C; Voice Communication With Computers - Chapter 8.

21 Oct 2008 - No class

28 Oct 2008 Hidden Markov Models

reading assignment for 5 Nov:

Vemuri, S., et al. (2003). Improving speech playback using time-compression and speech recognition. Proceedings of CHI 2004 pp. 295-302

A. Ranjan et al. Searching in audio: the utility of transcripts, dichotic presentation, and time compression, CHI 2006

Whittaker, S., et al. SCANMail: a voicemail interface that makes speech browsable, readable, and searchable. Proceedings of CHI 2003, pp. 275-282

Vemuri, S., and Bender, W. Next-generation personal memory aids. BT Technology Journal, October 2004, pp. 125-138

Tucker, S. and Whitaker, S. Time is of the essence: an evaluation of temporal compression algorithms, CHI 2006

5 Nov 2008 Applications of Recognition: retrieval in voice docs

reading assignment for 18 Nov:

Schmandt C; Voice Communication With Computers - Chapter 9.

Duncan, S. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology Vol 23, No. 2, pp 283-292, 1972

Erving Goffman, Forms of Talk, Chapter 1.

Grosz, B and Sidner, C. Attention, intentions, and the structure of discourse. Computational Linguistics, Vol. 12, No. 3, July-Sept. 1986 pp. 176-204

Clark, H. and Brennan, S. Grounding in Communication. chapter in Perspectives on Socially Shared Cognition, Resnick, L., Levine, J. and Teasley, S. eds.

18 Nov 2008 Discourse

25 Nov 2008 Identity, Community, and Participation

Papers to read for this class:

Nowak, K. and Rauh, C. The Influence of the Avatar on Online Perceptions of Anthropomorphism, Androgyny, Credibility, Homophily, and Attraction. Journal of Computer Mediated Communication.
Lampel, J. and Bhalla, A. The Role of Status Seeking in Online Communities: Giving the Gift of Experience. Journal of Computer Mediated Communication.
Ren, Y. Kraut, R. and Kiesler, S. Applying Common Identity and Bond Theory to Design of Online Communities. Organizational Studies.
Davis, F. Fashion, Culture, and Identity. (Chapter 1).
Yahoo! Design Pattern Library (skim through the social section of this site. It's not earth-shattering, but it's a nice aggregation of common community techniques for representing identity and incenting participation that I think complement the papers in this section nicely from a very practical perspective.)

Week of Dec 1 2008 Presence & Being There

Papers to read for this class:

Zhao, S. Towards a Taxonomy of Copresence Presence, October 2003.
Casanueva, J. and Blake, E. The Effects of Group Collaboration on Presence in a Collaborative Virtual Environmen Eurographics Workshop on Virtual Environments, 2000.
Hollan, J. and Stornetta, S. Beyond Being There. Conference on Human Factors in Computing Systems 1992.
Smith, I and Hudson, S. Low disturbance audio for awareness and privacy in media spaces, ACM Multimedia, 1995. Note: the link to audio samples given in the paper is still good.
Neustaedter, C, Greenberg, S., and Boyle, M. Blur filtration fails to preserve privacy for home-based video conferencing. ACM Transactions on Computer Human Interaction, March 2006

Week of Dec 7 2008 Managing Interruption

Papers to read for this class:

Forgarty, J. et al. Predicting human interruptibility with sensors, ACM Transactions on Computer-Human Interaction, March 2005 pp 119-146
Avrahami, J. and Hudson, S. Responsiveness in instant messaging: predictive models supporting interpersonal communication, CHI 2006
Horvitz, E., Koch, P., and Apacible, J. BusyBody: creating and fielding personal models of the cost of interruption CSCW 2004
Hseih, et al. Can markets help? applying market mechanisms to improve synchronous communication CSCW 2008

Note: ignore what follows this line

20 Nov 2008 Non-speech audio and auditory "display"

27 Nov 2008 No class, Thanksgiving

4 Dec 2008 Probably no class (may be rescheduled)

Arons, B. SpeechSkimmer: A System for Interactively Skimming Recorded Speeech, ACM Transactions on Computer Human Interaction, March 1997

Roy, D. and Schmandt, C. NewsComm: a hand-held interface for interactive acces to structured audio proceedings, CHI 1996

11 Dec 2008 Physical interaction in handheld devices

Yes, I know the last official day of class is 10 Dec.
Extra readings: