Switch this page to white background design
At the core, I am a Human-Computer Interaction (HCI) research scientist, inventor, and visionary. I dream up, create, research, design, develop, and patent systems with fundamentally novel user interfaces in the domains of mobile communication and mobile computing, ubiquitous computing (ubicomp), artificial intelligence (AI), robotics (HRI), and augmented reality (AR). My strengths are "engineering creativity," and connecting the dots of research and emerging technologies to create radically new products and services. I have more than 15 years of experience in corporate and academic environments, including MIT, Samsung, Hewlett-Packard, and Harman International. I am currently holding a position as Vice President for Future Experience at Harman. My most recent Curriculum Vitae (CV) can be found here.

Motivation: why, how, and what?

From a global historical perspective, mankind has only just begun creating technologies with the explicit intention to directly enhance our bodies and minds. Although we have built tools to enhance our motor skills and perception for many millennia already (think "hammer" and "eye glasses"), I am referring to augmentation of higher level mental skills. The augmentation technologies that we do have already, however, have impacted us deeply: for example, mobile phones extend our conversational range and allow us talk to remote people almost anywhere, from anywhere. Mind you, although this seems just like an augmentation of our senses, it is actually an augmentation of our social interaction potential. This becomes clear when we look at today's mobile devices, which in general intend on enhancing our knowledge, telling us where we are, what other people think and see, and many other useful things. Going way beyond that, it is foreseeable, however, that Human Augmentation technologies even in the near future will enhance us in much more extreme ways: perceptually, cognitively, emotionally, and on many other levels. I am intent on pushing the envelope in this direction, and creating technologies to serve this purpose.

And I have been doing so for almost 25 years: in order to understand the problem well, I have studied the human psyche in depth for eight years, and then studied the engineering side at MIT for another eight years. I believe that in order to create technologies that immediately and explicitly enhance people, and allow us to interact with technology more intuitively, we need to combine deep engineering and psychology knowledge, and all shades in between. So, I have worked in the fields of mobile communication, augmented reality, virtual worlds, artificial intelligence and robotics, and created a series of working prototypes that show how I think we will interact with future technologies.

Positions held

From 1997 to 2005, I was at the MIT Media Lab as research assistant and lead for ten research projects in the domain of speech interfaces and mobile communication, conversational and communication agents, embodied agents, and wireless sensor networks. I supervised undergraduate researchers (software, robotics, circuit design). I was in close contact with the lab's 150+ industrial sponsors, and gave over 100 presentations, talks, and demonstrations.

From 2005 to 2010, I was project leader and principal researcher at Samsung research lab in San Jose, CA. I was in charge of initiating and managing HCI research projects, and headed a small team of PhD researchers. My job was to envision, design, and develop fundamentally novel user interface concepts, based on original research in the domains of ubiquitous computing, mobile communication, artificial intelligence, robotics, virtual worlds, and augmented reality. One of my larger projects explored new ways how to interact naturally and intuitively with 3D content (AR content, virtual worlds, games, etc.) on mobile and nomadic devices (from cellphones to tablets to laptops to unfolding display devices to portable projection devices and more). We built working prototypes that demonstrate key interaction methods and intelligent user interfaces for natural interaction. My patent applications serve as rough outlines for projects.

From 2011 to 2012, I was with HP, as the director for future concepts and prototyping at Palm: I lead, managed, and inspired teams of end-to-end prototyping and research engineers (hardware & software), UI prototyping engineers, and UI production developers (the latter only interim until August 2011) in the webOS Human Interface group. I created working systems of future interaction methods, filed for many patents, and contributed to strategic roadmaps across all of Palm/HP. My projects were in the fields of wand and pen input, mobile 3D interfaces, remote multi-touch interfaces, and more. The patent applications I filed with Palm, once they become public, will outline of some of the projects we did there.

Since June 2012, I am with Harman International, as vice president of future experience. My responsibilites include assembling and leading a team of advanced research and prototyping engineers in order to create working prototypes and IP of technologies that will enable radically new UX for future Harman products. I am also in the process of enhancing and future proofing the current UX of all Harman International products, a huge task since Harman International is doing business in many different domains, from infotainment to lifestyle to professional audio. My office is in beautiful Palo Alto downtown, in a building that has housed many famous companies before us (i.e., PayPal). Going beyond the mobile focus I had at Samsung and HP, I now think deeply about future user experience in cars, and UX synergies between home, mobile, and automotive platforms. I am applying my expertise in conversational systems and spatial interfaces, but can also include the interactive spatial audio domain since Harman is deeply involved in audio systems of any kind.

My functions

As an engineer, I am an inventor, builder, and implementer who uses software and hardware rapid prototyping tools to create systems from the lowest level (firmware, sensors, actuators) up to highest level applications with GUIs and networking capabilities. Systems I have built at MIT include palm-sized wireless animatronics with full duplex-audio, conversational agents interacting with multiple parties simultaneously, autonomously hovering micro helicopters, laser based micro projectors, and a drawing tool with built in camera and an array of sensors to pick up and draw with visual properties of everyday objects.

As an HCI researcher, I can isolate relevant scientific problems, tackle them systematically by using my extensive training as a psychologist, and then come up with novel theories and visions to solve them. Then I integrate theories and technologies into working prototypes and innovative systems, and verify their validity with rigorous user testing, be it with ethnographic methodologies or in experimental lab settings.

As a leader, I am able to assemble a team of world class experts, lead and advise them on all research, engineering, and HCI levels. I am able to inspire and enable the team members, bringing out the best in them, and at the same time keeping strategic requirements in mind. Over the time, my teams have had sizes up to 12 people, but I can create high impact systems and prototypes with smaller teams as well.

Academics

I have earned three degrees: both Master's and PhD in Media Arts & Sciences from MIT, and an additional Master's in Psychology, Philosophy, Computer Science (University of Bern, Switzerland).

I have received my PhD in Media Arts & Sciences from MIT in June 2005. During my studies, I worked as an HCI research assistant at the Media Lab at MIT. I was part of the Speech + Mobility Group (then called "Speech Interface Group"), headed by Chris Schmandt, where we worked on speech interfaces and mobile communication devices. Although in a user interface group, my personal approach includes software and robotic agents to enhance those interfaces. My PhD work (as well as my Master's thesis) illustrate that, and my qualifying exams domain reflect these ideas as well.

Portfolio

My representative industry projects focus on creating devices and systems which allow us to interact with mobile devices more naturally and intuitively. In a wider sense, I am working on Human Augmentation. I am both engineer and research director in this field.

My representative MIT projects—mainly my doctoral work—focus on adding human-style social intelligence to mobile communication agents that are embodied both in the robotic and software domain.

My past projects are diverse, and include projects such as autonomously levitating devices. Some projects are not in the engineering domain at all, but in the psychology realm (my other background).
Leadership
Harman International Vice President, Future Experience (part of the Corporte Technology Group under CTO IP Park)
I have a dual leader role: first, I lead the future experience (FX) team, which I founded in June 2012. The FX team does top-down vision driven research and engineering on UX. I lead and inspire the team members, initiate the group's projects, hire new members, patent, and interface with other R&D groups and the Technology Strategy team at Harman. The team's charter is to come up with novel UX that spans all areas of Harman and beyond, exploring synergies and new areas. My second leadership role is in advancing all advanced user experience at Harman International. In this role, I influence roadmaps and R&D across all of Harman, from the automotive to consumer to professional divisions. I am in contact with all HCI related teams at Harman, and work on future proofing the UX.

Year: 2012 - present
Status: ongoing
Domain: engineering, management
Type: research group, executive function
Position: vice president
HP Palm GBU Palm was bought by HP in July 2010; it was renamed to webOS, and made open source in 2012 Director, Future Concepts and Prototyping Team (part of the HI team at webOS/Palm business unit of HP)
I founded the team in January 2011. My focus is on leading and inspiring the team members, directing the group's research agenda, hiring new members, interfacing with external groups, and patenting. The team's charter is to do risky and holistic HCI research and end-to-end prototyping (spanning software and hardware) that pushes the edges of HCI. Our projects target future product releases, approximately 3-5 years from the current releases. The team consists of Ph.D. level researchers with diverse backgrounds, from 3D virtual environments to robotics to architecture to speech interfaces. (I have led other teams at HP, up to 12 people, who do UI development and software UI prototyping.) We interact with engineers (software and hardware), designers, researchers (e.g., HP Labs), and planners (roadmapping and strategic planning). Our output consists of working prototypes and patents of interaction methods that serve as ground work for future releases. Our patenting efforts are significant, about one invention disclosure per week.

Year: 2011 - 2012
Status: concluded
Domain: engineering, management
Type: research group
Position: director
Team members: Seung Wook Kim, Davide Di Censo
Samsung Electronics) Samsung Electronics Project Lead and Team Lead, HCI Research Team (part of the Computer Science Lab at Samsung R&D)
I founded the team in 2008, and lead it until my departure from Samsung at the end of 2010. The team size was between 3 and 5 members, with the staff researchers holding doctoral degrees in various fields, and interns from first tier universities. My task was to initiate and execute strategic HCI projects, both in collaboration with other Samsung groups and external groups. My duties included leading and inspiring the team members, setting the team's direction, creating strategic and feasible project plans, keeping the projects on track, hiring, and patenting. Our main accomplishments were working prototypes (see some projects below), evangelizing these prototypes to Samsung executives (up to chairman, CEO, and CTO), patenting core technologies, and technical reports of our research.

Year: 2008 - 2010
Status: concluded
Domain: engineering, management
Type: research group
Position: project and group leader
Collaborators: Seung Wook Kim (2008-2010), Francisco Imai (2008-2009), Anton Treskunov (2009-2010), Han Joo Chae (intern 2009), Nachiket Gokhale (intern 2010)
Representative industry projects
Concept
iVi concept (2008) Early demo
iVi prototype (2009)

Please click on thumbnails for larger pictures and videos!
iVi: Immersive Volumetric Interaction (Samsung R&D)
This is the master project that I created for the HCI research team. The core idea was to invent and create working prototypes of future consumer electronics devices, using new ways of interaction, such as spatial input (i.e., gestural and virtual touch interfaces) and spatial output (i.e., view dependent rendering, position dependent rendering, 3D displays). Platform focus was on nomadic and mobile devices (from cellphones to tablets to laptops), novel platforms (wearables, AR appliances), and some large display systems (i.e., 3D TV). We created dozens of prototypes (some described below) covered by a large number of patent applications. A very successful demo of spatial interaction (nVIS) was selected to be shown at the prestigious Samsung Tech Fair 2009. One of our systems with behind-display interaction (DRIVE) got selected for the Samsung Tech Fair 2010. Yet another one used hybrid inertial and vision sensors on mobile devices for position dependent rendering and navigating in virtual 3D environments (miVON).

Year: 2008 - 2010
Status: concluded
Domain: engineering, management
Type: portfolio management
Position: project and group leader
Collaborators: Seung Wook Kim (2008-2010), Francisco Imai (2008-2009), Anton Treskunov (2009-2010), Han Joo Chae (intern 2009), Nachiket Gokhale (intern 2010)
Video of interaction
This video shows a research prototype system that demonstrates a behind display interaction method. Anaglyph based 3D rendering (colored glasses) and face tracking for view dependent rendering creates the illusion of dices sitting on top of a physical box. A depth camera is pointed at the users hands behind the display, creating a 3D model of the hand. Hand and virtual objects interact using a physics engine. The system allows users to interact seamlessly with both real and virtual objects. <br><br>
		This video, together with a short paper, was presented at the MVA 2011 conference (12th IAPR Conference on Machine Vision Applications, Nara, Japan, June 13-15, 2011).<br><br>
		Please note that this video is not 3D itself: the virtual content appears to the glasses wearing user as behind the display, where (from his perspective) they sit on a physical box. Concept
Concept illustration of reach-behind, in AR game context Prototype
DRIVE prototype based on optical see-through display panel Patent drawings
Some figures from the DRIVE patent application
DRIVE: Direct Reach Into Virtual Environment (Samsung R&D)
This novel interaction method allows users to reach behind a display and manipulate virtual content (e.g., AR objects) with their bare hands in the volume behind the device. We designed and constructed multiple prototypes: some are video see-through (tablet with an inline camera and sensors in the back), some are optical see-through (transparent LCD panel, depth sensor behind the device, front facing imager). The latter system featured (1) anaglyphic stereoscopic rendering (to make objects appear truly behind the device), (2) face tracking for view-dependent rendering (so that virtual content "sticks" to the real world), (3) hand tracking (for bare hand manipulation), and (4) virtual physics effects (allowing completely intuitive interaction with 3D content). It was realized using OpenCV for vision processing (face tracking), Ogre3D for graphics rendering, a Samsung 22-inch transparent display panel (early prototype), and a PMD CamBoard depth camera for finger tracking (closer range than what a Kinect allows). This prototype was demonstrated Samsung internally to a large audience (Samsung Tech Fair 2010), and we filed patent applications. (Note that anaglyphic rendering was the only available stereoscopic rendering method with the transparent display. Future systems will likely be based on active shutter glasses, or parallax barrier or lenticular overlays. Also note that a smaller system does not have to be mounted to a frame, like our prototype, but can be handheld.)

Year: 2010
Status: concluded
Domain: engineering
Type: demo, paper (MVA2011), paper (3DUI2011), This video shows a research prototype system that demonstrates a behind display interaction method. Anaglyph based 3D rendering (colored glasses) and face tracking for view dependent rendering creates the illusion of dices sitting on top of a physical box. A depth camera is pointed at the users hands behind the display, creating a 3D model of the hand. Hand and virtual objects interact using a physics engine. The system allows users to interact seamlessly with both real and virtual objects. <br><br>
		This video, together with a short paper, was presented at the MVA 2011 conference (12th IAPR Conference on Machine Vision Applications, Nara, Japan, June 13-15, 2011).<br><br>
	Please note that this video is not 3D itself: the virtual content appears to the glasses wearing user as behind the display, where (from his perspective) they sit on a physical box.video, patent application
Position: project and group leader
Collaborators: Seung Wook Kim, Anton Treskunov
Video of demo
This video shows a research prototype that demonstrates new media browsing methods with direct bare hand manipulation in a 3D space on a large stereoscopic display (e.g., 3D TV) with 3D spatial sound.<br><br>This demo was created at the Samsung Electronics U.S. R&D Center in San Jose, California, in August 2010.<br><br>(Note that this video uses monoscopic rendering. The actual demo renders 3D content stereoscopically, and the user wears active shutter glasses to experience the depth rendering. Similarly, the sound of this video is normal stereoscopic sound, where as the original demo uses 7.1 spatial sound.) Prototype
Interactive 3DTV UI (2010) System setup
System setup, with perceived media wall (first scene). The media wall
Spatial Gestures for 3DTV UI (Samsung R&D)
This project demonstrates new media browsing methods with direct bare hand manipulation in a 3D space on a large stereoscopic display (e.g., 3D TV) with 3D spatial sound. We developed a prototype on an ARM-based embedded Linux platform with OpenGL ES (visual rendering), OpenAL (spatial audio rendering), and ARToolKit (for hand tracking). The main contribution was to create multiple gesture interaction methods in a 3D spatial setting, and implement these interaction methods in a working prototype that includes remote spatial gestures, stereoscopic image rendering, and spatial sound rendering.

Year: 2010
Status: concluded
Domain: engineering
Type: demo, video, report
Position: project and group leader
Collaborators: Seung Wook Kim, Anton Treskunov
Complete demo video
This video shows a series of nVIS research prototypes and interaction methods, all designed and implemented by the CSL HCI team at the Samsung Electronics U.S. R&D Center in San Jose, California.<br><br>
		This video was presented at the exhibition at the ACM SIGCHI Conference on Human Factors in Computing Systems 2010 (CHI 2010).<br><br>
		The HCI team members were Seung Wook Kim, Anton Treskunov, and Stefan Marti. This project was affiliated with the Samsung Advanced Institute of Technology (SAIT). Curved displays
Close up of some of the tiles: view-dependent rendering on non-planar display space. Rendering with asymetric view frustum makes sure that the user experiences the virtual environment in a seamless way, regardless of the position and orientation of the display tiles. Mobile & static
The system consists of static and even handheld (mobile) tiles, which render the 3D content spatially correct from the user's perspective (position dependent rendering). The handheld display is tracked in 6DOF to accomplish that, so the user can choose any view and perspective.
nVIS: Natural Virtual Immersive Space (Samsung R&D)
This project demonstrates novel ways to interact with volumetric 3D content on desktop and nomadic devices (e.g., laptop), using methods like curved display spaces, view-dependent rendering on non-planar displays, virtual spatial touch interaction methods, and more. We created a series of prototypes consisting of multiple display tiles simulating a curved display space (up to 6), rendering with asymmetric view frustum (in OpenGL), vision-based 6DOF face tracking (OpenCV based and faceAPI), and bare hand manipulation of 3D content with IR-marker based finger tracking. The system also shows a convergence feature, by dynamically combining a mobile device (e.g., cellphone or tablet) with the rest of the display space, while maintaining spatial visual continuity among all displays. One of the systems includes upper torso movement detection for locomotion in virtual space. In addition to desktop based systems, we created a prototype for public information display spaces, based on an array of 70-inch displays. All final systems were demonstrated Samsung internally at the Samsung Tech Fair 2009.

Year: 2008-2010
Status: concluded
Domain: engineering
Type: demos, videos, technical reports [CHI 2010 submission], patents
Position: project and group leader
Collaborators: Seung Wook Kim, Anton Treskunov
Concept (2008)
This video shows a collection of novel concepts for mobile interaction with virtual environments such as games, augmented reality, and virtual worlds (e.g., Second Life). These interaction methods can be used for cellphones, tablets, and other handheld devices.<br><br>
		The original concepts were created by the HCI team at the Samsung Electronics U.S. R&D Center in San Jose, California. The video was created in August 2008, and presented at the Virtual Worlds Expo of September 2008 in Los Angeles.<br><br>
		This video was intended as an outline for the HCI team's research projects that resulted in working prototypes and pending patent applications. Some of the interaction concepts are now well known, but were shown first in this concept video. Multiple devices
This concept illustration shows multiple devices rendering 3D content spatially correct, depending on their respective positions on the 6DOF space. Egomotion sensing
The first part shows position dependent rendering on a netbook with two sideways facing imagers, each of them providing 2D optical flow data of the background. With this configuration, it is easy for the device to do egomotion detection, and disambiguate rotational movements from linear movements.<br><br>
		The second part shows position dependent rendering on a UMPC using only the backfacing imager. It provides 2D optical flow data of the background, which is used to determine pan and tilt of the device.<br><br>
		Note that we did use neither inertial nor magnetic sensors in either demo. Phone demo
This video shows a software prototype for natural interaction with 3D content on a Samsung Omnia cellphone. It uses both the internal inertial sensors (accelerometers) as well as the camera for egomotion sensing. The camera is used for vision processing to determine optical flow of the background in 2D. This allows the device to detect slow linear motion, which is not possible with inertial sensors.<br><br>
		This prototype was created by the CSL HCI team at the Samsung Electronics U.S. R&D Center in San Jose, California, in summer 2009.
miVON: Mobile Immersive Virtual Outreach Navigator (Samsung R&D)
This project is about a novel method for interacting with 3D content on mobile platforms (e.g., cellphone, tablet, etc.), showing position-dependent rendering (PDR) of a 3D scene such as a game or virtual world. The system disambiguates shifting and rotating motions based on vision-based pose estimation. We developed various prototypes: a UMPC-version using optical flow only for pose estimation and a Torque3D game engine, a netbook based prototype that used up to four cameras to disambiguate imager-based 6DOF pose estimation, and a cellphone based prototype that combined inertial and vision based sensing for 6DOF egomotion detection (see videos on the left). Multiple patents were filed, and our code base was transferred to the relevant Samsung business units.

Year: 2008-2010
Status: concluded
Domain: engineering
Type: demo, video, paper [SPBA 2009 "Position dependent rendering of 3D content on mobile phones using gravity and imaging sensors"], patents
Position: project and group leader
Collaborators: Seung Wook Kim, Han Joo Chae, Nachiket Gokhale
System concept
PUPS system concept illustration, showing various use cases, and necessary technology pieces Dome mockup
Dome umbrella with external projection and touch mockup Nubrella mockup
Nubrella with mockup map projection Patent drawings
Drawings for the patent application, showing multiple setup options.
PUPS: Portable Unfolding Projection Space (Samsung R&D)
This project is about mobile projection display systems, to be used as a platform for AR systems, games, and virtual worlds. The technology is based on "non-rigid semi-transparent collapsible portable projection surfaces," either wearable or backpack mounted. It includes multiple wearable display segments and cameras (some inline with the projectors), and semi-transparent and tunable opacity mobile projection surfaces which are unfolding in an origami style, adjusting to the eye positions. There is a multitude of applications for this platform, from multiparty video conferencing (conference partners are projected around the user, and a surround sound system disambiguates the voices), to augmented reality applications (the display space is relatively stable with regards to the user, and AR content will register much better with the environment than handheld or head worn AR platforms), to cloaking and rear view applications. The portable display space is ideal for touch interactions (surface is at arm's length), and can track the user's hands (gestures) as well as face (for view dependent rendering). This early stage project focused on patenting and scoping of the engineering tasks, but did not go beyond that stage.

Year: 2008-2009
Status: concluded
Domain: engineering
Type: project plan, reports, patent application
Position: project and group leader
Collaborators: Seung Wook Kim, Francisco Imai
Pet robotics
Pet robotics is an engineering area, but also highly relevant for HCI and HRI. Our approach is to emphasize lifelikeness of a pet robot, by employing (among others) emotional expressivity (expresses emotion with non-verbal cues, non-speech audio), soft body robotics (silent and sloppy actuator technologies; super soft sensor skin), and biomimetic learning (cognitive architecture and learning methods inspired by real pets). Animatronic mediator
Pet robotics makes most sense when the emotional attachment is combined with a utilitarian perspective. The purpose of an animatronic mediator is to add a familiar and physical front end to a device that eliminates need to learn device-specific GUI by using human-style interaction methods, both verbal and non-verbal. In order to achieve that, a handheld animatronic mediator (UI peripheral to cellphone) is needed which interprets userís intentions, uses natural interaction methods (speech, eye contact, voice tone detection).
Pet Robotics and Animatronic Mediators (Samsung R&D)
Pets are shown to have a highly positive (therapeutic) effects on humans, but are not viable for all people (allergies, continued care necessary, living space restrictions, etc.) From a consumer electronics perspective, there is an opportunity to create robotic pets with high realism and consumer friendliness to fill in, and create high emotional attachment by the user. Our approach emphasizes the increase of lifelikeness of a pet robot, by employing (among others) emotional expressivity (expresses emotion with non-verbal cues, non-speech audio), soft body robotics (silent and sloppy actuator technologies; super soft sensor skin), and biomimetic learning (cognitive architecture and learning methods inspired by real pets). My in depth analysis of the field covered the hard technical problems to get to a realistic artificial pet, the price segment problem (gap between toy and luxury segments), how to deal with the uncanny valley, and many other issues. This early stage project focused on planning and project scoping, but did not enter engineering phase. However, it is related to my dissertation field of Autonomous Interactive Intermediaries.

Year: 2008
Status: concluded
Domain: engineering
Type: report
Position: project leader
Patent drawing 1
Mobile search method by manually framing the target object with hands and fingers (using vision processing to detect these gestures). Patent drawing 2
Various options for framing an object of interest. Patent drawing 3
Mobile search method by finger pointing and snapping (using audio triangulation of the snapping sound) as an intuitive
Googling Objects with Physical Browsing (Samsung R&D)
This project is about advanced methods of free-hand mobile searching. I developed two novel interaction methods for a localized in-situ search. The underlying idea is that instead of searching for websites, people who are not sitting in front of a desktop computer may search for information on physical objects that they encounter in the world: places, buildings, landmarks, monuments, artifacts, products—in fact, any kind of physical object. This "search in spatial context, right-here and right-now," or physical browsing, poses special restrictions on the user interface and search technology. I developed two novel core interaction methods, one based on manually framing the target object with hands and fingers (using vision processing to detect these gestures), the other based on finger pointing and snapping (using audio triangulation of the snapping sound) as an intutive "object selection" method. The project yielded two granted patents.

Year: 2006-2007
Status: concluded
Domain: engineering
Type: report, patent 1 (point and snap), patent 2 (finger framing)
Position: project leader
Representative projects done at MIT

Click on thumbs for larger images
Autonomous Interactive Intermediaries
An Autonomous Interactive Intermediary is a software and robotic agent that helps the user manage her mobile communication devices by, for example, harvesting 'residual social intelligence' from close by human and non-human sources. This project explores ways to make mobile communication devices socially intelligent, both in their internal reasoning and in how they interact with people, trying to avoid, e.g., that our advanced communication devices interrupt us at completely inappropriate times. My Intermediary prototype is embodied in two domains: as a dual conversational agent, it is able to converse with caller and callee—at the same time, mediating between them, and possibly suggesting modality crossovers. As an animatronic device, it uses socially strong non-verbal cues like gaze, posture, and gestures, to alert and interact with the user and co-located people in a subtle but public way. I have built working prototypes of Intermediaries, embodied as a parrot, a small bunny, and a squirrel—which is why some have called my project simply "Bunny Phone" or "Cellular Squirrel". However, it is more than just an interactive animatronics that listens to you and whispers into your ear: When a call comes in, it detects face-to-face conversations to determine social groupings (see Conversation Finder project), may invite input ("vetos") from the local others (see Finger Ring project), consults memory of previous interactions stored in the location (called Room Memory project), and tries to assess the importance of the incoming communication by conversing with the caller (see Issue Detection project). This is my main PhD thesis work. More...

Year: 2002 - 2005
Status: dormant (not active as industry project, but I keep working on it)
Domain: engineering
Type: system, prototypes, paper, another paper,short illustrative videos, dissertation, demo video [YouTube], patent 1, patent 2
Press: many articles
Position: lead researcher
Advisor: Chris Schmandt
PhD Thesis committee: Chris Schmandt, Cynthia Breazeal, Henry Lieberman
Collaborators: Matt Hoffman (undergraduate collaborator)

Mockup (top), prototype (middle), alignment example (bottom)
Conversation Finder
Conversation Finder is a system based on a decentralized network of body-worn wireless sensor nodes that independently try to determine with who the user is in a face-to-face conversation with. Conversational groupings are detected by looking at alignment of speech—the way we take turns when we talk to each other. Each node has a microphone and sends out short radio messages when its user is talking, and in turn listens for messages from close-by nodes. Each node then aggregates this information and continuously updates a list of people it thinks its user is talking to. A node can be queried for this information, and if necessary can activate a user's Finger Ring (see Finger Ring project). Depending on my conversational status, my phone might or might not interrupt me with an alert. This system is a component of my PhD work on Autonomous Interactive Intermediaries, a large research project in context-aware computer-mediated call control.

Year: 2002 - 2005
Status: dormant
Domain: engineering
Type: system, prototypes, papers, explanatory video 2003 (1 minute) [RealVideo]
Position: lead researcher
Advisor: Chris Schmandt
Collaborators: Quinn Mahoney (undergraduate collaborator 2002-2003), Jonathan Harris (undergraduate collaborator 2002)

Working prototype (top), wired rings used for user tests (middle, bottom)
Finger Ring, "Social Polling"
Finger Ring is a system in which a cell phone decides whether to ring by accepting votes from the others in a conversation with the called party. When a call comes in, the phone first determines who is in the user's conversation (see Conversation Finder project). It then vibrates all participants' wireless finger rings. Although the alerted people do not know if it is their own cellphones that are about to interrupt, each of them has the possibility to veto the call anonymously by touching his/her finger ring. If no one vetoes, the phone rings. Since no one knows which mobile communication device is about to interrupt, this system of 'social polling' fosters collective responsibility for controlling interruption by communication devices. I have found empirical evidence that significantly more vetoes occur during a collaborative group-focused setting than during a less group oriented setting. This system is a component of my PhD work on Autonomous Interactive Intermediaries, a large research project in context-aware computer-mediated call control.

Year: 2002 - 2005
Status: dormant
Domain: engineering
Type: system, prototypes, paper
Position: lead researcher
Advisor: Chris Schmandt
Issue Detection
Issue Detection is a system that is able to assess in real-time the relevance of a call to the user. Being part of a conversational agent that picks up the phone when the user is busy, it engages the caller in a conversation using speech synthesis and speech recognition to get a rough idea for what the call might be about. Then it compares the recognized words with what it knows about what is currently 'on the mind of the user.' The latter is harvested continuously in the background from sources like the user's most recent web searches, modified documents, email threads, together with more long term information mined from the user's personal web page. The mapping process has several options in addition to literal word mapping. It can do query extensions using Wordnet as well as sources of commonsense knowledge. This system is a component of my PhD work on Autonomous Interactive Intermediaries, a large research project in context-aware computer-mediated call control.

Year: 2002 - 2005
Status: dormant
Domain: engineering
Type: system
Position: lead researcher
Advisor: Chris Schmandt

Illustration of channel sequence
Active Messenger
Active Messenger (AM) is a personal software agent that forwards incoming text messages to the user's mobile and stationary communication devices such as cellular phones, text and voice pagers, fax, etc., possibly to several devices in turn, monitoring the reactions of the user and the success of the delivery. If necessary, email messages are transformed to fax messages or read to the user over the phone. AM is aware of which devices are available for each subscriber, which devices were used recently, and if a message was received and read by the user by exploiting back-channel information and by inferring from the users communication behavior over time. The system treats filtering as a process rather than a routing problem. AM is up and running since 1998, serving between 2 and 5 users, and has been refined over the last 5 years in a tight iterative design process. This project started out as my Master's thesis at the MIT Media Lab (finished 1999), but has been continued until the present time. More...

Year: 1998 - 2005
Status: system stable and still (!!) in continuous use
Domain: engineering
Type: system, thesis, paper (HCI), paper (IBM), tech report
Position: lead researcher
Advisor: Chris Schmandt
I/O Brush
I/O Brush is a new drawing tool to explore colors, textures, and movements found in everyday materials by "picking up" and drawing with them. I/O Brush looks like a regular physical paintbrush but has a small video camera with lights and touch sensors embedded inside. Outside of the drawing canvas, the brush can pick up color, texture, and movement of a brushed surface. On the canvas, artists can draw with the special "ink" they just picked up from their immediate environment. I designed the electronics on the brush (sensors, etc), the electronics 'glue' between the brush and the computers, and wrote the early software. This project is the PhD work of Kimiko Ryokai, and has been presented at many events, including a 2-year interactive exhibition at the Ars Electronica Center in Linz, Austria. More...


Year: 2003 - 2005
Status: active
Domain: engineering
Type: system, prototypes, paper, paper (design), video [MPEG (27MB)] [MOV (25MB)] [YouTube]
Position: collaborator
Collaborators: Kimiko Ryokai (lead researcher), Rob Figueiredo (undergraduate collaborator), Joshua Jen C. Monzon (undergraduate collaborator)
Advisor: Hiroshi Ishii
Past projects
Robotic F.A.C.E.
Robotic F.A.C.E., which stands for Facial Alerting in a Communication Environment, explored the use of a physical object in the form of a face as a means of user interaction, taking advantage of socially intuitive facial expressions. We have built an interface to an expressive robotic head (based on the mechanics of a commercial robotic toy) that allows the use of socially strong non-verbal facial cues to alert and notify. The head, which can be controlled via a serial protocol, is capable of expressing most basic emotions not only in a static way, but also as dynamic animation loops that vary some parameter, e.g., activity, over time. Although in later projects with animatronic components (Robotic P.A.C.E., Autonomous Interactive Intermediaries) I did not reverse engineer a toy interface anymore, the experience gained with this project was very valuable. More...

Year: 2003 - 2004
Status: done
Domain: engineering
Type: system
Position: lead researcher
Collaborators: Mark Newman (undergraduate collaborator)


Robotic P.A.C.E.
The goal of the Robotic P.A.C.E. project was to explore the use of a robotic embodiment in the form of a parrot, sitting on the user's shoulder, as a means of user interaction, taking advantage of socially intuitive non-verbal cues like gaze and postures. These are different from facial expressions (as explored in the Robotic F.A.C.E. project), but at least as important as them for grabbing attention and interrupting in a socially appropriate way. I have built an animatronic parrot (based on a hand puppet and commercially available R/C gear) that allows the use of strong non-verbal social cues like posture and gaze to alert and notify. The wireless parrot, which can be controlled from anywhere by connecting to a server via TCP which in turn connects to a hacked R/C long range transmitter, is capable of quite expressive head and wing movements. Robotic P.A.C.E. was a first embodiment for a communication agent that reasons and acts with social intelligence.

Year: 2003 - 2004
Status: done
Domain: engineering
Type: system, illustrative video (2 minutes) [ Quicktime 7,279kb] [YouTube]
Position: lead researcher

Tiny Projector
Mobile communication devices get smaller and smaller, but we'd prefer if the displays would get larger instead. The solution to this dilemma is to add projection capabilities to the mobile device. The basic idea behind TinyProjector was to create the smallest possible character projector that can be either integrated into mobile devices like cellphones, or linked wirelessly via protocols like Bluetooth. During this 2-year project, I have built ten working prototypes; the latest one uses eight laser diodes and a servo-controlled mirror that "paints" characters onto any surface like a matrix printer. Because of the laser light, the projection is highly visible even in daylight and on dark backgrounds. More...

Year: 2000 - 2002
Status: done
Domain: engineering
Type: prototypes, report
Position: lead researcher

2-way pager used for Knothole
Knothole
Knothole (KH) uses mobile devices such as cellphones and two-way pagers as mobile interfaces to our desktop computers, combining PDA functionality, communication, and Web access into a single device. Rather than put intelligence into the portable device, it relies on the wireless network to connect to services that enable access to multiple desktop databases, such as your calendar and address book, and external sources, such as news, weather, stock quotes, and traffic. In order to poll a piece of information, the user sends a small message to the KH server, which then collects the requested information from different sources and sends it back as a short text summary. Although development of KH has finished 1998, it is currently used by Active Messenger, which it enhances and with which it interacts seamlessly. More...

Year: 1997 - 1998
Status: system stable and in continuous use
Domain: engineering
Type: system, prototypes, paper (related)
Position: lead researcher
Advisor: Chris Schmandt

Early prototype (top), later designs (middle, bottom)
Free Flying Micro Platforms, "Zero-G Eye"
A Free Flying Micro Platform (FFMP) is a vision for small autonomously hovering mobot with a wireless video camera that carries out requests for aerial photography missions. It would operate indoors and in obstacle rich areas, where it avoids obstacles automatically. Early FFMPs would follow high level spoken commands, like "Go up a little bit, turn left, follow me, and would try to evade capture. Later it would understand complex spoken language such as "Give me a close up of John Doe from an altitude of 3 feet" and would have refined situational awareness. The Zero-G Eye is a first implementation of a FFMP that was built to explore ways of creating an autonomously hovering small device. The sensor-actuator loop is working, but flight times were highly constrained because of a too low lift-to-weight ratio. Later prototypes are in different planning stages, and profit from experiences made with earlier devices. As a side note, I have been virtually obsessed with small hovering devices for a very long time already, and have designed such devices since I was 10 years old. More...

Year: 1997 - 2001
Status: prototypes developed; project dormant
Domain: engineering
Type: prototypes, report 1, report 2, report 3, paper, paper (related)
Position: lead researcher
ASSOL (Adaptive Song Selector Or Locator)
The Adaptive Song Selector Or Locator (ASSOL) is an adaptive song selection system that dynamically generates play lists from MP3 collections of users that are present in a public space. When a user logs into a workstation, the ASSOL server is notified, and the background music that is currently played in this space is influenced by the presence of the new user and her musical preferences. Her preferences are simply extracted from her personal digital music collection, which can be stored anywhere on the network and are streamed from their original location. A first time user merely has to tell the ASSOL system where her music files are stored. From then on, the play lists are compiled dynamically, and adapt to all the users in a given area. In addition, the system has a Web interface that allows users to personalize certain songs to convey certain information and alert them without interrupting other people in the public space. More...

Year: 2000
Status: done
Domain: engineering
Type: system, prototype, report
Position: researcher
Collaborators: Kwan Hong Lee (co-researcher)
OpenSource SpeechSynth
The OpenSource SpeechSynth (OSSS) is a purely Web based text-to-speech synthesizer for minority languages, for which no commercial speech synthesizer software is available, e.g., Swiss German. It is based on a collaborative approach where many people contribute a little, so that everybody can profit from the accumulated resources. Its Web interface allows visitors to both upload sound files (words), as well as synthesize existing text. The speech synthesizing method used in this project is word based, which means that the smallest sound units are words. Sentences are obtained by waveform concatenation of word sounds. Due to the word concatenation approach, the OSSS works with any conceivable human language. It currently lists 90 languages, but users can easily add a new language if they wish, and then start adding word sounds. During the 4 years the OSSS is online, it has been tested by many Web visitors, specifically by the Lojban community. More...

Year: 2000 - 2001
Status: done; up and running
Domain: engineering
Type: system, report
Position: lead researcher

Different weather conditions
WeatherTank
WeatherTank is a tangible interface that looks like a tabletop sized vivarium or a diorama, and uses everyday weather metaphors to present information from a variety of domains, e.g., "a storm is brewing" for increasingly stormy weather, indicating upcoming hectic activities in the stock exchange market. WeatherTank represents such well-known weather metaphors with desktop sized but real wind, clouds, waves, and rain, allowing users to not only see, but also feel information, taking advantage of our skills developed through our lifetimes of physical world interaction. A prototype was built that included propellers for wind, cloud machines, wave and rain generators, and a color-changing lamp as sun mounted on a rod that can be used to move the sun along an arc over the tank, allowing the user to manipulate the time of day.

Year: 2001
Status: done
Domain: engineering
Type: system, report (short), report, unedited demo video (18.5 minutes) [RealVideo] [YouTube]
Position: researcher
Collaborators: Deva Seetharam (co-researcher)

Screenshot
Impressionist visualization of online communication
This system provides an intuitive, non-textual representation of online discussion. In the context of a chat forum, all textual information of each participant is transformed to a continuous stream of video. The semantic content of the text messages is mapped onto a sequence of videos and pictures. The mapping is realized on the side of the receiver, because a simple text line like "I love cats" means different things to different people. Some would associate this with an ad for cat food, some other would be more negative because they dislike the mentality of cats and would therefore see pictures like a dog chasing a cat. For this purpose, each participant has a personal database of semantic descriptions of pictures and videos. If the participant scans the messages of a group, this textual information is transformed automatically to a user specific multiple stream of video. These video snippets have purely connotative meanings. I have built a proof-of-concept system with live video streams. More...

Year: 1998
Status: done
Domain: engineering, art installation
Type: system, report
Position: lead researcher

Example for system reasoning (top), game example (bottom)
Daboo
Daboo is a real time computer system for the automatic generation of text in the specific context of the word guessing game Taboo. To achieve the gameís goal—let the user guess a word as fast as possible without using certain taboo word—our system uses sophisticated algorithms to model user knowledge and to interpret semantically the user input. The former is very important to gradually enhance the performance of the system by adapting to the user's strongest "context" of knowledge. The latter helps bridge the gap between a guess and the actual word to guess by creating a semantic relationship between the two. For this purpose we rely on the semantic inheritance tree of WordNet. Daboo acts effectively as the clue giving party of a Taboo session by interactively generating textual descriptions in real time. More...

Year: 1997
Status: done
Domain: engineering
Type: system, report
Position: researcher
Collaborators: Keith Emnett (co-researcher)
Psychological Impact of Modern Communication Technologies
In this two-year study, I examined both the communicative behavior in general and the use of communication technologies of eight subjects in detail using extensive problem-centered interviews. From the interview summaries, a general criterion for media separation was extracted, which allows the systematic separation of all media into two groups: on one side the verbal-vocal, realtime-interactive, and non-time-buffered media like telephone, intercom, and face-to-face communication; on the other side the text-based, asynchronous, and time-buffering media like letter, telefax, and email. The two media answering machine and online chatting (realtime communication via computer monitor and keyboard) occupy exceptional positions because they cannot be assigned to either group. Therefore, these two media were examined in detail. Through analyzing them under the aspects of both a semiotic ecological approach and a privacy regulation model, important characteristics and phenomena of their use can be explained, and future trends be predicted. This work was part of my first Master's thesis in Psychology.

Year: 1993
Status: done
Domain: psychology
Type: study, thesis, paper
Position: researcher
Advisor: Urs Fuhrer
Influence of Video Clip on Perception of Music
This study explored the question if music is perceived and rated differently when presented alone versus when presented together with a promotional video clip. Thirty-six subjects filled out semantic differentials after having listened to tree different songs, each under one of the following conditions: the song was presented without video; the song was presented with the corresponding promotional video clip; the song was presented with random video. We found that subjects instructed to evaluate music do this in similar ways, independent from the presence of a matching or mismatching video clip. However, presenting a song together with a corresponding video clip decreases the possibility for the listeners/viewers to interpret the music in their own way. Furthermore, our data suggests that it is difficult and perhaps arbitrary to rate an incompatible or unmotivated music video mix, and that an appropriate video clip makes the "meaning" of the song more unequivocal.

Year: 1989
Status: done
Domain: psychology
Type: study, report
Position: researcher
Advisor: Alfred Lang
Collaborators: Fränzi Jeker und Christoph Arn (co-researchers)
Physiological Reactions to Modern Music
This half-year study was motivated by our subjective experiences of a striking physiological reaction ("goose pimples") when listening to some modern pop and rock songs. We hypothesized that this physiological reaction, possibly caused by adrenaline, is the most important factor that determines if a song pleases an audience: it can lead to massive sales of records and high rankings on music charts. This project explores if an accumulation of such "pleasing spots" can be found when listening to specific songs. Our experimental results from 33 subjects prove the existence of such accumulations; additionally, we examined if we were able to predict these accumulations ahead of the experiment. Our results show also that we were able to predict them intuitively, but not all of them and not without errors. Only assumptions can be made about the criteria for passages that attracted attention as accumulations of conjectured physiological reactions. There are four factors: increase of musical density (for example, chorus passages); musical intensification of melody or harmony; increase of rhythm; increase of volume.

Year: 1986
Status: done
Domain: psychology
Type: study, report
Position: researcher
Advisor: Alfred Lang
Collaborators: Sibylle Perler, Christine Thoma, Robert Müller, and Markus Fehlmann (all co-researchers)
© 1997 - 2012 by Stefan Marti & MIT Media Lab. Send comments to: . Last updated: July 27, 2014
home | research | writings | courses | personal
Home Writings Courses Personal