Mitsubishi Electric Research Labs, Cambridge, MA
June-August '99
Abstract
The goal of this work is to develop a set of techniques by which the voice of one speaker (cue) can be transformed to sound like that of a different speaker (target). This requires analysis and extraction of aset of acoustic features that not only describe the salient characteristics of a target speaker, but also permit reconstruction for natural-sounding synthesis. We use a combination of LPC analysis, vocal tract modeling, and harmonic+noise coding to represent the dynamics of phonemes and the expressive qualities in speech.
The dynamics of the target speaker are modeled via HMMs and then a mapping to the trajectories of the cue speaker is learned. Once trained, such acoustic features could allow a synthesis of the original voice driven by a different speaker or cue signal.
Applications of such work include personification of text-to-speech synthesis, entertainment (karaoke and film dubbing), improved recognition of distorted speech, and providing a natural voice to the speech impaired.
Presentation at MERL (Aug 19, 1999)
Preliminary research - no results published yet.
Notes on Literature Review & References
..
N I + I N
nitin@media.mit.edu
Sun Nov 21 19:00:29 EST 1999