Balloons of Bhutan Senario Sketch

By Lining YaoAnthony DeVincenziRamesh RaskarHiroshi Ishii from MIT Media Lab

What we can do if the screen in videoconference rooms can turn into an interactive display? With Kinect camera and sound sensors, We explore how expanding a system's understanding of spatially calibrated depth and audio alongside a live video stream can generate semantically rich three-dimensional pixels containing information regarding their material properties and location. Four features are implemented, which are "Talking to Focus", "Freezing Former Frames", "Privacy Zone" and "Spacial Augmenting Reality". . Engadget . Fast Company

"Talking to Focus":
The system recognizes those currently speaking and places them within a blur- free range of the depth field . Also, Contextual bubbles showing speaker's information such as name, shared documents, speaking time, which are all vital informations for a conference talk.

"Freezing Former Frames":
People might want to conduct some tasks without being noticed by the other side. By freezing gesture, people can freeze themselves and make a still image for a short time, while let the other part of the screen work as normal. This technique is useful for executing small tasks such as checking email, short conversation, or temporarily leaving the room when it may be considered rude as viewed from the remote location.

"Privacy Zone":
This application attemptsto allow the user to render oneself, or specified area, invisible with a gestural command. By hiding specific pixels at a certain depth, the simulation does not interrupt objects moving in the foreground. For example, we can hide messy background from the otherside, or hide another group of people working on other stuffs due to space limitation.

"Spacial Augmenting Reality"
The 3D position of any objects can be defined by our system, and the objects can be assigned with an augmented status. In the following example, the position and distance can be read out in real time; also, people can click object on the screen (which is physically in another space) and see the augmented information remotely.



"Technical Details:"
Below is our setup.The system fundamentals consist of two networked locations, each containing a video screen for viewing the opposite space, a standard RGB digital web camera enhanced by a depth sensing "3D camera" such as the Microsoft Kinect, and calibrated microphones for audio queue and location. Computational processing is applied to each of the video streams, incorporating a number of custom algorithms for the perceptual manipulation of space. C++ and the openFrameworks library are used for video processing and effect rendering.

"Cartoon Storyboard:"
The proliferation of broadband and high-speed Internet access has, in general, democratized the ability to commonly engage in videoconference. However, current video systems do not meet their full potential, as they are restricted to a simple display of unintelligent 2D pixels. We present a system for enhancing distance-based communication by augmenting the traditional video conferencing system with additional attributes beyond two-dimensional video.

Interactive Paper is in CSCW 2010 with title "Kinected Conference: Augmenting Video Imaging with Calibrated Depth and Audio".
For more information, please contact: