Back Talk is Andrea's thesis project through which she aims to create a co-present network of non collocated friends while they engage in a common viewing activity (such as television watching).
Here is a video showing initial work.
Motivation The communal role of the television in creating social experiences and memories in the "living room" has
broken down with the introduction of on-demand content. The broadcast nature of television was instrumental in shaping
these shared experiences by its inherent synchronicity and locality. However, people may still enjoy sharing such viewing
experiences with friends and family.
Back Talk attempts to re-create this social experience by fitting your social network into your television viewing
experience using a cell phone as a primary backchannel between remote groups of people.
Approach
I leverage the cell phone to function in a dual role i) as a controller to access your network of friends and their status and
ii) as the backchannel itself - providing an option for rich communication via voice/text. It proves advantageous in that
it precludes the need for new controllers or a keyboard.
System Snapshot
The system includes the following features:
(i) An Engagement Meter : to convey engagement of my remote peers. This is done using Galvanic Skin Response data to detect
sudden emotional changes in response to viewable content and a Laughter Detector.
(ii) Push-To-Talk : to broadcast voice messages to a select group of friends.
(iii) Reaction Shots : captures pictures via a webcam fitted on the television. These reaction shots are in response to
sudden viewer reactions as picked up by the engagement meter. They would be temporally associated with video content in order
to enrich asynchronous viewing experiences with reactions of friends. This feature could create an augmented annotation layer
to already existing video content thereby providing for elements of a shared experience even asynchronously.
This is how a typical acoustic environment is set up around a viewer. The focus is on making the experience for the
listener enjoyable with minimal intrusion. As highlighted in this depiction a camera on the TV and the cell phone are
part of the sensing module. I am experimenting with IR cameras and sensors to obtain corneal reflections - this would
help identify the number of faces in front of the screen and whenever this number changes. The camera would also help
in gaze detection. In this case gaze detection would help convey viewer engagement to remote buddies by detecting
if I'm looking away from my screen.
The system has four major components.
The cell phone acts as the primary controller and interface to co-viewing buddies. It is designed to appear like a virtual
couch. This mimicks a typical setting where friends gather to watch a show together. I use the term "sonic avatars" for each
avatar representation of a remote friend. The controller allows a viewer to control how much audio she is transmitting as
well as how much audio she would like to hear from remote friends.
A key component of BackTalk is its sensing module. At a straightforward level, the system will detect whenever a viewer
speaks (makes a comment) and will transmit it to remote friends. In addition sensing that goes beyond direct audio will
be translated to audio cues.
So, what exactly is the system sensing? Engagement via 1) Laughter Detection and 2) Sudden arousal - Galvanic
Skin Response. Attentiveness to the program on-screen via gaze detection. Number of faces in front of the television via
eye ball detection. The idea of this module is to automate the process of "emoting" to remote friends.
Processing in BackTalk includes updates from members watching together in a virtual couch environment. This module
translates what is sensed into audio cues. For example the system looks at providing indications of number
people watching (audio cue: crowd noise) and people coming and going (audio cues: footsteps, door
opening and closing, click of the remote). The sound generation process requires various inputs. The two
different sources of input are 1) spoken communication between friends 2) what is sensed beyond.
Finally, the output acoustic environment around the viewer. This setting will play audio cues in a spatially
distributed fashion - using a set of stereo speakers on either side of the couch. In the BackTalk system there will be two classes of sound sources: natural sounds from users and
synthetic sounds to indicate activity. All these different sound sources are mixed into a stereo signal,
where location in one dimension is obtained by left/right panning of each sound source.