Voice Mapping: Learning to speak like David Letterman?

Nitin Sawhney, Michael Casey and Matt Brand

Mitsubishi Electric Research Labs, Cambridge, MA
June-August '99

Abstract

The goal of this work is to develop a set of techniques by which the voice of one speaker (cue) can be transformed to sound like that of a different speaker (target). This requires analysis and extraction of aset of acoustic features that not only describe the salient characteristics of a target speaker, but also permit reconstruction for natural-sounding synthesis. We use a combination of LPC analysis, vocal tract modeling, and harmonic+noise coding to represent the dynamics of phonemes and the expressive qualities in speech.

The dynamics of the target speaker are modeled via HMMs and then a mapping to the trajectories of the cue speaker is learned. Once trained, such acoustic features could allow a synthesis of the original voice driven by a different speaker or cue signal.

Applications of such work include personification of text-to-speech synthesis, entertainment (karaoke and film dubbing), improved recognition of distorted speech, and providing a natural voice to the speech impaired.

Presentation at MERL (Aug 19, 1999)

Preliminary research - no results published yet.

Notes on Literature Review & References

..

N I + I N

nitin@media.mit.edu
Sun Nov 21 19:00:29 EST 1999