MAS 632 Conversational Computer Systems
Fall 2008
Chris Schmandt
E15-368a
617-253-5156
geek@media.mit.edu
Course secretary: Kristin Hall, kristin@media.mit.edu
See Kristin for all missed handouts (although we will try to avoid
using paper).
course mailing list
The course mailing list is mas632@media.mit.edu.
It is assumed that all students are on this list.
course text
Schmandt, Conversational Computer Systems
This book may be found
here. All my publications are found on the Speech + Mobility Group's web pages.
The syllabus of this class will be approximately the table of contents of
the book. We will add more recent case studies from the literature and
have more to say about topics in spatialized audio, speech skimming,
non-speech audio, and computer mediated voice communication.
Rather than a series of formal lectures, this course will be based on your
reading for the week and asking questions / discussing in class. Thus it is
difficult to predict an exact week-by-week syllabus in advance. This document
will evolve during the term to reflect our actual rate of progress. During
the last third of the semester, we will optionally cover additional topics in
non-speech audio and user interfaces for mobile devices.
grades and policies
There will be 8 to 10 problem sets, which will be graded on a scale of
1 (awful) to 5 (great). There will be no exams or projects. Grades will
be based on problem set performance, with a half-grade factor for class
participation, including doing the readings.
Problem sets are due in class on the due date, and will be accepted only in
hard copy (unless prior arrangements have been made). Problem sets will
be accepted up until the day they are discussed in class, but submissions
after the due date will incur a 1/2 point penalty. (penalty waived if the
instructor has more than two ungraded problem sets not yet returned). You
can use whatever word processor or text formatter you want, but except in
special cases hand written problem sets are not acceptable. Figures (flow
charts, box drawings, etc.) may be hand drawn and labelled.
4 Sept 2008 - introduction and genres of conversation
handouts:
- Problem set 1, due 11 Sept.
reading assignment for 11 Sept.:
Chapanis, A. Interactive Human Communication. Scientific American, 232, (1975) pp. 36-42
Chalfonte B.L., Fish R.S., Kraut R.E.; Expressive Richness: A Comparison Of Speech and Text As Media For Revision
Isaacs E.A., Tang J.C.; What Video Can and Cannot Do For Collaboration: A Case Study
Schmandt C.; Voice Communication With Computers - Chapter 1
11 Sept 2008 - components of conversation and speech production
look at these:
wide band and narrow band spectrograms
handouts:
- Problem set 2, due 18 Sept.
reading assignment for 16 Sept:
Arons, B. Techniques, Peception and Applications of Time-Compressed Speech , Proceedings, 1992 Avios Conference
Schmandt C; Voice Communication With Computers - Chapter 2
16 Sept 2008 - Speech production and hearing
play with this:
Sagittal Section of the Head
look at this:
Quileute alphabet
handouts:
- Problem Set 3, due 23 Sept
reading assignment for 23 Sept:
Schmandt C; Voice Communication With Computers - Chapter 3 (this is longer than the previous chapters - read with care)
23 Sept 2008 - Speech coding
- Problem set 4, due 30 Sept
reading assignment for 30 Sept:
Arons, B. A review of the cocktail party effect, Journal of the American Voice I/O Society, July 1992
Mullins, M. and Schmandt, C. AudioStreamer: exploring simultaneity for listening, CHI 1995
Kobayashi, M, and Schmandt C. Dynamic Soundscape: mapping time to space for audio browsing. CHI 1997
Schmandt C; Voice Communication With Computers - Chapter 4, skim
30 Sept 2008 - Accessing recorded speech
handouts:
reading assignment for 7 Oct.:
Spiegel, M. Difficulties with Names. Speech Technology Magazine June 2003
Pisoni D.B., Nusbaum H.C. & Greene B.G. (1985). Perception of synthetic speech generated by rule. Proceedings of the 1EEE 73:1665-1676
J. Lai, D. Wood, and M. Considine, The effect of task conditions on the comprehensibility of synthetic speech. CHI 2000
L. Gong and J. Lai, Shall we mix synthetic speech and human speech?: impact on users' performance, perception, and attitude. CHI 2001
J. Lai, K. Cheng, P. Green and O. Tshimhoni, On the road and on the Web?: comprehension of synthetic and human speech while driving. CHI 2001
Schmandt C; Voice Communication With Computers - Chapters 5 and 6
7 Oct 2008 - Speech synthesis
Listen to the various speech synthesis samples .
reading assignment for 14 Oct.:
Schmandt C; Voice Communication With Computers - Chapter 7.
14 Oct 2008 - review of problem sets and speech recognition intro
reading assignment for 28 Oct.
Rudnicky, A and Hauptman, A. Models for evaluating interaction protocols in speech recognition. Proceedings of CHI 1991, pp 285-291
Suhm, B., et. al. A comparative study of speech in the call center: natural language call routing vs. touch-tone menus. Proceedings of CHI 2002 pp. 283-290
What can I say? Evaluating a spoken language interface to email. Walker, M., et al. Proceedings of CHI 1998, pp. 582-589
Designing SpeechActs: issues in speech user interfaces. Yanelovich, N., et al. Proceedings of CHI 1995 pp. 369-376
MailCall: message presentation and navigation in a nonvisual environment. Marx, M. and Schmandt, C. Proceedings of CHI 1996, pp. 165 - 172
Schmandt C; Voice Communication With Computers - Chapter 8.
21 Oct 2008 - No class
28 Oct 2008 Hidden Markov Models
reading assignment for 5 Nov:
Vemuri, S., et al. (2003). Improving speech playback using time-compression and speech recognition. Proceedings of CHI 2004 pp. 295-302
A. Ranjan et al. Searching in audio: the utility of transcripts, dichotic presentation, and time compression, CHI 2006
Whittaker, S., et al. SCANMail: a voicemail interface that makes speech browsable, readable, and searchable. Proceedings of CHI 2003, pp. 275-282
Vemuri, S., and Bender, W. Next-generation personal memory aids. BT Technology Journal, October 2004, pp. 125-138
Tucker, S. and Whitaker, S. Time is of the essence: an evaluation of temporal compression algorithms, CHI 2006
5 Nov 2008 Applications of Recognition: retrieval in voice docs
reading assignment for 18 Nov:
Schmandt C; Voice Communication With Computers - Chapter 9.
Duncan, S. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology Vol 23, No. 2, pp 283-292, 1972
Erving Goffman, Forms of Talk, Chapter 1.
Grosz, B and Sidner, C. Attention, intentions, and the structure of discourse. Computational Linguistics, Vol. 12, No. 3, July-Sept. 1986 pp. 176-204
Clark, H. and Brennan, S. Grounding in Communication. chapter in Perspectives on Socially Shared Cognition, Resnick, L., Levine, J. and Teasley, S. eds.
18 Nov 2008 Discourse
25 Nov 2008 Identity, Community, and Participation
Papers to read for this class:
- Nowak, K. and Rauh, C. The Influence of the Avatar on Online Perceptions of Anthropomorphism, Androgyny, Credibility, Homophily, and Attraction. Journal of Computer Mediated Communication.
- Lampel, J. and Bhalla, A. The Role of Status Seeking in Online Communities: Giving the Gift of Experience. Journal of Computer Mediated Communication.
- Ren, Y. Kraut, R. and Kiesler, S. Applying Common Identity and Bond Theory to Design of Online Communities. Organizational Studies.
- Davis, F. Fashion, Culture, and Identity. (Chapter 1).
- Yahoo! Design Pattern Library (skim through the social section of this site. It's not earth-shattering, but it's a nice aggregation of common community techniques for representing identity and incenting participation that I think complement the papers in this section nicely from a very practical perspective.)
Week of Dec 1 2008 Presence & Being There
Papers to read for this class:
- Zhao, S. Towards a Taxonomy of Copresence Presence, October 2003.
- Casanueva, J. and Blake, E. The Effects of Group Collaboration on Presence in a Collaborative Virtual Environmen Eurographics Workshop on Virtual Environments, 2000.
- Hollan, J. and Stornetta, S. Beyond Being There. Conference on Human Factors in Computing Systems 1992.
- Smith, I and Hudson, S. Low disturbance audio for awareness and privacy in media spaces, ACM Multimedia, 1995. Note: the link to audio samples given in the paper is still good.
- Neustaedter, C, Greenberg, S., and Boyle, M. Blur filtration fails to preserve privacy for home-based video conferencing. ACM Transactions on Computer Human Interaction, March 2006
Week of Dec 7 2008 Managing Interruption
Papers to read for this class:
- Forgarty, J. et al. Predicting human interruptibility with sensors, ACM Transactions on Computer-Human Interaction, March 2005 pp 119-146
- Avrahami, J. and Hudson, S. Responsiveness in instant messaging: predictive models supporting interpersonal communication, CHI 2006
- Horvitz, E., Koch, P., and Apacible, J. BusyBody: creating and fielding personal models of the cost of interruption CSCW 2004
- Hseih, et al. Can markets help? applying market mechanisms to improve synchronous communication CSCW 2008
Note: ignore what follows this line
20 Nov 2008 Non-speech audio and auditory "display"
27 Nov 2008 No class, Thanksgiving
4 Dec 2008 Probably no class (may be rescheduled)
Arons, B. SpeechSkimmer: A System for Interactively Skimming Recorded Speeech, ACM Transactions on Computer Human Interaction, March 1997
Roy, D. and Schmandt, C. NewsComm: a hand-held interface for interactive acces to structured audio proceedings, CHI 1996
11 Dec 2008 Physical interaction in handheld devices
Yes, I know the last official day of class is 10 Dec.
Extra readings: