| the face Readings:
Mediated Faces I found the prerequisite of this week’s assignment, to use faces in the design of a mediated communication interface, challenging, despite the fact that it is often argued that second to language, the face is the richest expressive channel. As
Judith Donath’s paper Mediated Faces describes, the richness and
subtlety of facial expression is difficult to
translate to an online forum using current technology.After playing with several concepts, including the idea of controlling a gaze based interface using sensors imbedded in a Kermit the frog puppet, or considering the subtleties of expression that some cartoonists are able to express with only a few pen or brush marks (e.g. Manga), I finally settles on an interface designed to add some level of emotional feedback and indication of attentiveness to text based discussions. I liked the idea of maintaining an unknown ‘real’ face and the freedom from initial judgments of perceived gender, race, age, etc. that this entails. I chose the visual metaphore of a collage, with varying faces each sharing a user’s current expression, added to the screen as a discussion progresses. Face Collage My interface concept would try to extend the ideas raised in the extraction of key facial points (such as Brennan’s 186 ‘key points’ - inner and outer corner of each eye, etc.) or lines which have been used for producing caricatures or face morphing, which are discussed in Bruce and Young’s In the Eye of the Beholder. I thought it might be possible to analyse video of a user, and having established an average position of key point features (if such an average exists), compare this difference between ‘null’ and current expression with a pre-organised library of facial expressions. By matching the shape of the normalised (for overall face geometry) lines between the user’s current expression and collected face images, it may be possible to match the user’s smile with pictures of smiling faces, angry with angry pictures, etc.. A new face image would then be picked from the library and placed on screen next to the text that had triggered the response each time the process is triggered. Judging the current text of interest would require some level of eye tracking. ![]() To limit the update of faces, to prevent over crowding of the screen, the system might have to only update with each large shift in expression (measured as deviation from the current representational image). Alternatively, if voice recognition text entry is used rather than typing on a keyboard, an update could be posted on each audible, non-verbal response used (such as “mmm”, “huh”, “yeah”) these are intended to encourage, and would appear encouraging on screen. It is questionable whether users would initially use such sounds in this environment. Discussion The approach relies on the assumption that expressions can be interpolated from one face to another, and that matches in expression have the same perceived effect in different face structures. People may also find the continually changing identity presented in a multitude of individual faces unnatural and confusing, or hard to follow. I thought about providing visual clues to identity in the form of colour labeling, possibly vertically lines that run along side the text, onto which the faces are attached. The collage theme could be continued in the visual background, and could be extended to real life metaphores for 'sticking' on a new face (sticky tape, thumb tacks, etc.). Even aging of a post could take the form of creases, folded corners and yellowing (this is returning to last week’s seminar). Depending on the difficulty of realising full eye tracking, I considered placing a 'chat only' screen to one side of a user’s main screen. Turning one's face to read text would then be more easily identified by a camera placed above the screen and interpreted by the system as attention to the conversation (although no clue as to whether it is current or past text that the user is interested in). This allows gaze to be simplified to a binary condition, as unlike listening to someone speak in face to face interaction, one must face the screen to read the text. The system described would need considerable software and real-time computation in order to extract facial expressions, match these with a picture library, track eye direction and possibly (as suggested above) even voice recognition. ![]() So it is definitely a 'few-years-down-the-line' proposal. One benefit is low connection bandwidth as only text and still images are occasionally passed to other participants over a network (maybe even just references to images in a common library). Slight latency in image update also becomes much less of an issue in comparison with full video systems. It is also worth noting with video tracking, that face analysis is only required when attention to text and need for an update has occurred. The need for a reasonably high resolution video camera (and possibly a separate screen, which reduces the need for eye tracking, as described above) makes this system dependent on additional hardware for operation which would hinder its proliferation, and therefore popularity, and therefore use on the web. Final Remarks To reemphasise what I perceive to be some of the advantages of the interface:
And some drawbacks:
|