Computer-mediated Social Awareness & Perception in Everyday Environments

Questions and Comments from Orals Committee

Committee: Chris Schmandt, Mark Ackerman, Trevor Darrell

Nitin Sawhney
June 12, 2000

Note: This document is a quick transcription of the main points of discussion in the orals exam. My responses here are intentionally written up in a casual manner, to reflect the actual discussion where possible. I've commented where there needs to be more clarification on my part, but have not attempted to provide detailed or definite answers here. Towards the end I've summarized the main themes, which should serve as a resource for open issues that we can delve into for the generals as well as guiding on-going research.

Part I: Social Life of the Street

Mark: What's the role of Face in public expression?

I missed this early in the discussion, but Mark helped me clarify it a bit later. The notion of 'face' seems to be related in some sense to what Goffman calls "Audience Segregation", showing a different face to different people. Face probably embodies both "Given" and "Given-off" expressions. Mark says it's attributed to accidental public disclosure. In any public presentation of self, people will not show the same face to all, but one may notice a particular unintended face accidentally. This seems to be a central them in Goffman's writing, although I was unable to find a definite reference to it in his book 'presentation of self' - I have to look closely and understand the implications better.

Geek: What's the difference between "Given" & "Given-Off" expression?

Given-Off is an expressive disposition consisting of more theatrical and contextual, presumably unintentional mannerisms; expressions that are generally considered not explicitly controlled by the person. Some examples would be helpful here to clarify this.

Mark: Difference between "Given-Off" and "non-verbal" expression?

Goffman doesn't make this distinction, but given-off expressions can occur even in verbal discourse - appearing to say certain words unintentionally or pausing / tonality in discussion. One probably has to extend Goffman's thinking here to make the distinction more concrete.

Mark: What makes the hallway a space where people feel less private?

It not really less private … there's not a clear front-stage vs. backstage distinction in the garden workspace. But by introducing a camera and portal space there, we've changed the dynamics of the space now; people are probably slightly more cautious in their use of that space in many ways. For example I sometimes walk around the camera to prevent it from triggering articles since I'm not interested in reading them. Must think more clearly about Whyte and Goffman's perspective here.

Geek: Why does a high proportion of woman in a space make it any better?

Whyte found that higher proportion of woman in a place, the more likely it is to be popular. Does it have to do with the fact that they consider the area safe or simply better designed for habitation? I don’t think Whyte provides a good reason, except stating his observations.

Mark: It's not just that people tend to feel safer around others to explain their attraction to higher density areas!

Is there a human need to be in approximate social distance to others? Exchanging mutual gaze with familiar strangers? To feel welcome or provide a sense of belonging in a space? This seems to be an issue that's not well researched … or I simply haven't found a satisfying answer here.

Trevor: Cultural probes seem rather unscientific. Why would you use them over an ethnographic study? What would you hope to learn? Did anything surprise you from their findings?

Ethnographic studies tended to focus on "needs-based" analysis of elderly lives. Cultural probes found the elderly had diverse opinions, rich historical/cultural context and creative energies that could be tapped into. They suggested a design approach where one considers them an integral resource for the community rather than simply "needy individuals". It’s a radical shift in thinking towards design. In my mind the lesson here, is that one ought to simply broaden what's conventionally done in ethnographic studies i.e. participant observations and interviews with such cultural probes (a form of self-expression or creative reporting on the participants part) to get a richer understanding of the overall context.

 

Part II: Computer-mediated Awareness and Perception

Geek: How did the periodic visual images of people's offices in Portholes provide awareness that people were interruptible?

People tend to start recognizing and unconsciously remembering long-term patterns in other people's activity. Portholes tends to work as a form of passive awareness, when people take actions to initiate communication, its no longer a form of awareness. Ian Smith's work takes snapshots of recent activity over time, allowing one to see when someone left their office. This may help more directly address awareness for figuring out when some one is more interruptible, but of course it's not at all privacy preserving (as Ian will admit).

Mark: What does Goffman have to say about the Trust, Reciprocity and Perceptible Feedback in Aware Portals?

I really didn't address this question well in the orals. This would require some clear analytic thinking (and a deeper reading of Goffman) … so I won't address it here for the moment. Mark indicated he would press me further on these questions in my generals.

Geek: What was the effect of content on people's interaction with the portal? How might you evaluate that?

The canned news content although somewhat interesting to some, was not very conducive to creating an engaging source of interaction with the portal. Many people did not find it relevant. We would need to allow people to post content relevant to the needs in that physical space. It turns out that people often have discussions around shared cultural artifacts like TV programming. But its impossible to track the effect of information posted on the portal, unless it is somehow staged as a crisis event of sorts (Mark's comment).

Trevor: The baby-monitoring scenario smells like surveillance; what has that got to do with social awareness?

Consider the role of trusted humans in the monitoring loop, i.e. design awareness systems for family members and nurses to observe different kinds of awareness cues like general health/temporal rhythms of infant vs. biometric signs and safety of infant. Different levels of awareness representation and communication needed. Features used will depend on tasks/goals of monitoring. Overall one could consider emphasizing design that supports shared understanding, responsibility and communication between the family members and care-takers.

Trevor: Binary sensors don't provide an abstraction of social activity, it's when you generalize these patterns that you get such abstraction.

I completely agree. I had it backward in my slide! But Trevor points out an important distinction in the way one ought to think about capturing discrete patterns vs. abstracting out a generalized model of the activity.

Trevor: Difference between HMMs and Dynamic Bayesian Networks?

HMM's are a class of Bayesian Networks. But typically DBNs incorporate evidence from priors within its framework. But I'm not convinced that's the right distinction. I should study these approaches further.

Trevor: What's the "hidden" in HMMs?

I would say it’s the states that encode the complexity of the temporal relationships in the data. Once does not have access to the values of these states, we can only observe the transitions. My answer was not satisfying; I'll admit not having read HMMs in a year, my understanding was brittle, I should read the Rabiner paper again.

Trevor: What makes a system Bayesian?

It's not just probabilistic. It's the fact that it incorporates priors in its decision making. I had trouble here as well; I should read Bayesian approaches in Chris Bishop's book to get some clarity.

Trevor: What are the limits of these techniques? How would you find a "pink elephant upside-down" in a scene using a Bayesian framework?

My answer: well you would most likely create a representation takes understands shape and color, and places the novel object somewhere in the neighborhood of similar attributes such as pink and elephant, but not tell you that its found such an object [Shimon Edelman's work seems to resonate here]. Trevor: The real problem is that a Bayesian framework would simply not have any priors for such a novel object, that it would place the likelihood of such a class to zero. Hence one would have to rethink how to use Bayesian techniques here. Perhaps that's a fundamental limitation of how such statistical methods are applied today.

Trevor: What’s different in the perceptual approaches used in conventional vision/robotics vs. those needed for social awareness? Isn't awareness just for visualization, the system doesn't need to make any decisions?

In social awareness, the analysis can be guided within a framework setup such that human intervention can play a key role in disambiguating what's perceived in a scene. Most current approaches primarily look at modeling the scene without introducing sufficient prior knowledge, however for social awareness we can potentially setup a framework where we understand existing relationships and attributes of the real-world setting. This may allow meaningful learning or pattern recognition to occur within such a framework of rich prior (and evolving) knowledge guided by motivated human individuals. Indeed awareness systems would also need to make decisions such as altering you to salient activity or opening channels of communication when a party was deemed less interruptible.

Trevor: You say that a form of audio and visual fusion is useful for perception related to social awareness. What other research efforts have dealt with this issue?

My thinking was that the robotics community deals with sensor fusion to some extent in much of their work, but probably not well posed for our problem domain. Trevor and I pointed out that folks (David Stork and others) doing speech and lip modeling have a better understanding of audio/visual integration.

 

Summary of key themes for further study: (need to formulate more specific goals)

I: How does literature in sociology predict behavior in computer-mediated settings?

Goffman - Understanding concepts like Face, Given-off. Would be useful to analyze his thinking in the context of a system like aware-portals. People react with prior assumptions while using a mediated space in the hallway, but then they adapt somehow. What role can we play as designers in making that adaptation a meaningful one that allows extended engagement?

Whyte - examine his work more closely. Read other work he's done and see if there are meaningful themes for understanding computer-mediated settings.

What mix of ethnographic study and cultural probes makes sense for analyzing social settings for design? How does one pose the results of cultural probes within a more scientific framework?

To understand these issues better, it may be worthwhile doing a small and limited field study - one that you are willing to throw away, but could still gain some insight form - say in the domain of the elderly or a complex emergency-care setting?

II: How would one design a media space between distributed settings that deals particularly with problems of spontaneous encounters, peripheral awareness (asynchronous) and privacy? How would it address issues in prior work? - Consider role of gaze, audio, and ways for initiating communication based on awareness/interruption. Can we extend aware-portals to do so, or consider a different physical/media design based on findings by Whyte and Goffman? What would be the role of shared community content that would motivate people to use it?

III: What are the unique challenges of representation and learning for socially aware perception?

Consider the different kinds of multi-modal features, representations, and learning approaches that might be meaningful for particular well-posed tasks in scene understanding related to audio/visual salience or activity abstraction. What can be the role of audio in visual scene understanding in these specific settings. Examples like baby monitoring, understanding elderly lifestyle, or pedestrian / crowd scenes. In particular, consider the affordances and limitations of learning approaches like HMMs and DBNs for modeling temporal events and providing useful evidence form knowledge or context in such settings.

These are broad questions; the only way to the issues understand better is to perform a perceptual learning experiment using a real-world audio/visual data. What would be a good problem domain? Would have to find a way to label data and create training/test sets to measure performance. The challenging to obtain data of human activity unless we have explicit recording in such settings. Even if people agree to it, its not trivial to set-up the collection and labeling of such data. Brain Clarkson walks around with a camera/mic all day and labels scene changes via a touch pad (some people have objections until they understand the purpose of the project). Can we use any pre-existing data for this purpose, and focus instead on representation and learning? What would be the task we want to focus on - scene summary & notification based on salience-detection?