Home




Back Talk is Andrea's thesis project through which she aims to create a co-present network of non collocated friends while they engage in a common viewing activity (such as television watching).

Here is a video showing initial work.


Motivation The communal role of the television in creating social experiences and memories in the "living room" has broken down with the introduction of on-demand content. The broadcast nature of television was instrumental in shaping these shared experiences by its inherent synchronicity and locality. However, people may still enjoy sharing such viewing experiences with friends and family. Back Talk attempts to re-create this social experience by fitting your social network into your television viewing experience using a cell phone as a primary backchannel between remote groups of people.

Approach I leverage the cell phone to function in a dual role i) as a controller to access your network of friends and their status and ii) as the backchannel itself - providing an option for rich communication via voice/text. It proves advantageous in that it precludes the need for new controllers or a keyboard.

The system includes the following features:
(i) An Engagement Meter : to convey engagement of my remote peers. This is done using Galvanic Skin Response data to detect sudden emotional changes in response to viewable content and a Laughter Detector.

(ii) Push-To-Talk : to broadcast voice messages to a select group of friends.

(iii) Reaction Shots : captures pictures via a webcam fitted on the television. These reaction shots are in response to sudden viewer reactions as picked up by the engagement meter. They would be temporally associated with video content in order to enrich asynchronous viewing experiences with reactions of friends. This feature could create an augmented annotation layer to already existing video content thereby providing for elements of a shared experience even asynchronously.

System Snapshot

This is how a typical acoustic environment is set up around a viewer. The focus is on making the experience for the listener enjoyable with minimal intrusion. As highlighted in this depiction a camera on the TV and the cell phone are part of the sensing module. I am experimenting with IR cameras and sensors to obtain corneal reflections - this would help identify the number of faces in front of the screen and whenever this number changes. The camera would also help in gaze detection. In this case gaze detection would help convey viewer engagement to remote buddies by detecting if I'm looking away from my screen.







The system has four major components.
The cell phone acts as the primary controller and interface to co-viewing buddies. It is designed to appear like a virtual couch. This mimicks a typical setting where friends gather to watch a show together. I use the term "sonic avatars" for each avatar representation of a remote friend. The controller allows a viewer to control how much audio she is transmitting as well as how much audio she would like to hear from remote friends.

A key component of BackTalk is its sensing module. At a straightforward level, the system will detect whenever a viewer speaks (makes a comment) and will transmit it to remote friends. In addition sensing that goes beyond direct audio will be translated to audio cues.
So, what exactly is the system sensing? Engagement via 1) Laughter Detection and 2) Sudden arousal - Galvanic Skin Response. Attentiveness to the program on-screen via gaze detection. Number of faces in front of the television via eye ball detection. The idea of this module is to automate the process of "emoting" to remote friends.

Processing in BackTalk includes updates from members watching together in a virtual couch environment. This module translates what is sensed into audio cues. For example the system looks at providing indications of number people watching (audio cue: crowd noise) and people coming and going (audio cues: footsteps, door opening and closing, click of the remote). The sound generation process requires various inputs. The two different sources of input are 1) spoken communication between friends 2) what is sensed beyond.

Finally, the output acoustic environment around the viewer. This setting will play audio cues in a spatially distributed fashion - using a set of stereo speakers on either side of the couch. In the BackTalk system there will be two classes of sound sources: natural sounds from users and synthetic sounds to indicate activity. All these different sound sources are mixed into a stereo signal, where location in one dimension is obtained by left/right panning of each sound source.