Simultaneous and Spatial Listening

People using wearable devices must primarily attend to events in their environment, yet need to be notified of background processes or messages. Speech and music in the background and peripheral auditory cues [Gaver89] can provide an awareness of messages or signify events, without requiring one's full attention or disrupting their foreground activity. Audio easily fades into the background, but users are alerted when it changes [Cohen94]. It is possible for listeners to attend to multiple background processes via the audio channel as long as the sounds representing each process are distinguishable. This well known cognitive phenomenon, called the "Cocktail Party Effect" [Handel89], provides the justification that humans can in fact monitor several audio streams simultaneously, selectively focusing on any one and placing the rest in the background.

A good model of the head-related transfer functions (HRTF) permits effective localization and externalization of sound sources [Wenzel92]. Yet the cognitive load of listening to simultaneous channels increases with the number of channels. Experiments show that increasing the number of channels beyond three causes a degradation in comprehension [Stifelman94]. Bregman claims that stream segregation is better when frequency separation is greater between sound streams [Bregman90]. Arons suggests that the effect of spatialization can be improved by allowing listeners to easily switch between channels (providing perceptual handles on each channel) and pull an audio stream into focus as well as allowing sufficient time to fully fuse the audio streams [Arons92].

A spatial sound system can provide a strong metaphor by placing individual voices in particular spatial locations. The effective use of spatial layout can be used to aid auditory memory. The AudioStreamer [Schmandt95] detects the gesture of head movement towards spatialized audio-based news sources to increase the relative gain of the source, allowing simultaneous browsing and listening of several news articles. Kobayashi introduced a technique for browsing audio by allowing listeners to switch their attention between moving sound sources that play multiple portions of a single audio recording [Kobayashi97]. An audio landscape with directional sound sources and overlapping auditory streams (audio-braiding) can also provide a listening environment for browsing multiple audio sources easily [Maher97]. On a wearable device, spatial audio requires the use of headphones or shoulder mounted directional speakers. In noisy environments there will be a greater cognitive load to effectively use spatial audio, yet it can help segregate simultaneous audio streams more easily. Here the exact location of the sound is less important, but can provide cues about aspects of the message such as its category, urgency and time of arrival.

In Nomadic Radio, audio files are rendered in the spatial environment of the listener using a Java interface to the 3D RSX Audio API developed by Intel. The perceptual audio models used in 3D RSX are based on a set of head-related transfer function (HRTF) measurements of a KEMAR (electronic mannequin) by Bill Gardner at the Media Lab [Gardner95]. The measurements consist of the left and right ear impulse responses from a loudspeaker mounted 1.4 meters from the KEMAR. The HRTF model allows real-time rendering of several monophonic sound sources, positioned arbitrarily around the head and permits control of their elevation, azimuth, and distance cues.

Designing an effective spatial layout for a diverse set of audio messages requires a consideration of their content, priority and scalability issues. In Nomadic Radio, audio messages such as voicemail and news arrive at a different times throughout the day, hence their date and time of arrival provide a unique parameter for spatial layout. This approach is utilized to position messages in chronological order around a listener's head. The listener can discern the approximate time of arrival based on the general direction that the message is heard. The message category also determines the distance of the messages from the listener, indicating general importance of the category. For instance all voice messages are positioned closer and news messages placed further away.

Figure 3: Time-based spatial layout of hourly news (outer orbit) and voice messages (inner orbit). The cross-hair represents the center of the user's listening space, acting as a virtual ear to browse audio in the soundspace.

Each message arrives at a different point in time, hence its date and time of arrival provide a unique parameter for spatial layout. Thus a suitable approach is to utilize arrival time to position messages in chronological order around a listener's head (Figure 3). A spatial clock can permit messages arriving at noon to be positioned in the front and those at 3:00 PM on the right and so on. A twelve hour clock does not scale well for messages arriving throughout the day. Here messages arriving after 12:00 PM will overlap with existing ones from the AM. Auditory cues, played at the start of the message can indicate AM or PM, yet a another solution is to use a twenty four hour clock, that can better represent messages for an entire day. Using such a metaphor all messages arriving during the day occupy a unique position in the listening space.

The metaphor of radio is utilized to present personalized information as active broadcasts delivered within the user's listening environment. Several such broadcasts can be presented simultaneously as spatialized audio streams, to enable the listener to better segregate and browse multiple information sources. Spatial listening is utilized in three playback modes:

Broadcasting

When new messages arrive, they are broadcast to the listener from a specific spatial location. These messages are heard in the background and fade away if the user does not pay attention to them (i.e. activate them by a button press). This mode is based on the metaphor of traditional radio broadcasting, where listeners passively listen to news stories and only pay attention when a relevant article is heard.

Browsing

Browsing is an active form of listening where users can select a category and browse sequentially through all messages, playing each one as needed (using forward/back buttons on a wireless mouse). When an desirable message is heard the user can stop and listen to the entire message in the foreground. This mode is similar to the metaphor of switching stations on a radio until a station playing desirable music is found.

Scanning

Sometimes listeners want to get a preview of all their messages quickly without manually selecting and playing each one. This is similar to the scan feature on modern radio tuners that allows users to alternatively hear each station for a short duration. In Nomadic Radio, message scanning cycles through all messages by moving each one to the center of the listening space for a short duration of time, and fading it out as the next one starts to play. All messages are played sequentially in this manner, with some overlap as one message fades away and the next one begins to play. This simultaneity allows for more efficient browsing, while presenting the important content of the messages easily (i.e., the identity of a caller or the main news headlines). The scanning algorithm ensures that the messages are quickly brought into perceptual focus by pulling them to the listener rapidly, yet the messages are pushed back slowly to provide an easy fading effect as the next one is heard. As the message is pulled in, its direction is maintained allowing the user to retain a sense of message arrival time. This spatial continuity is important in discriminating and holding the auditory streams together [Arons92].

We must continue to explore alternative techniques for browsing and spatial audio design based on further evaluation of these methods. In some cases messages could be segregated based on urgency and category of the message or the physical location of people who wish to communicate with the user.


Auditory Perception

[Arnaud95] Nicolas Saint-Arnaud and Kris Popat. "Analysis and Synthesis of Sound Textures". AJCAI workshop on Computational Auditory Scene Analysis, August 1995.

[Bregmen90] Bregman, Albert S. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, 1990.

[Handel89] Handel, S. Listening: An Introduction to the Perception of Auditory Events. MIT Press, 1989.

[Sawhney97c] Sawhney, Nitin. " Situational Awareness from Environmental Sounds", Project Report for Pattie Maes, MIT Media Lab, June 1997.

Auditory Interfaces and Non-speech Audio Display

[Cohen94] Cohen, J. Monitoring background activities. Auditory Display: Sonification, Audification, and Auditory Interfaces. Reading, MA: Addison-Wesley, 1994.

[Edwards94] Edwards, W. Keith, Elizabeth D. Mynatt, and Kathryn Stockton. "Providing Access to Graphical User Interfaces - Not Graphical Screens". ACM Proceedings on ASSETS '94, November 1994.

[Gaver89] William W. Gaver. The Sonic Finder: An interface that uses auditory icons. Human Computer Interaction, 4:67-94, 1989.

[Raman96] Raman, T. V. "Emacspeak --A Speech Interface". Proceedings of CHI '96, April 1996.

Spatial Audio Listening and Browsing

[Arons92] Barry Arons. "A Review of the Cocktail Party Effect". Journal of American Voice I/O Society, Vol. 12, July 1992.

[Gardner95] Gardner, W. G., and Martin, K. D. HRTF measurements of a KEMAR. Journal of the Acoustical. Society of America, 97 (6), 1995, pp. 3907-3908.

[Kobayashi97] Kobayashi, Minoru and Chris Schmandt. "Dynamic Soundscape: Mapping Time to Space for Audio Browsing". Proceedings of CHI '97, March 1997.

[Maher97] Maher, Brenden. "Navigating a Spatialized Speech Environment through Simultaneous Listening and Tangible Interactions". M.S. Thesis, Media Arts and Sciences, MIT Media Lab, Fall 1997.

[Schmandt95] Schmandt, Chris and Atty Mullins. "AudioStreamer: Exploiting Simultaneity for Listening". Proceedings of CHI 95, pp. 218-219, May 1995.

[Wenzel92] Wenzel, E.M. Localization in virtual acoustic displays, Presence, 1, 80, 1992.


Next: Awareness and Communication

Back: Architecture of Nomadic Radio

Nomadic Radio Website

Nitin Sawhney
Last modified: Sat Jan 17 19:44:12 EST