Loading…

Acoustic to articulatory mapping with deep neural network

Synthetic talking avatar has been demonstrated to be very useful in human-computer interactions. In this paper, we discuss the problem of acoustic to articulatory mapping and explore different kinds of models to describe the mapping function. We try general linear model (GLM), Gaussian mixture model...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia tools and applications 2015-11, Vol.74 (22), p.9889-9907
Main Authors:	Wu, Zhiyong, Zhao, Kai, Wu, Xixin, Lan, Xinyu, Meng, Helen
Format:	Article
Language:	English
Subjects:	Acoustic mapping Acoustics Algorithms Avatars Computer Communication Networks Computer Science Data Structures and Information Theory Experiments Human-computer interaction Information science Interactive computer systems Laboratories Learning theory Linear programming Mapping Mathematical models Multimedia communications Multimedia computer applications Multimedia Information Systems Neural networks Phonetics Special Purpose and Application-Based Systems Speech Speeches Studies Voice simulation
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Synthetic talking avatar has been demonstrated to be very useful in human-computer interactions. In this paper, we discuss the problem of acoustic to articulatory mapping and explore different kinds of models to describe the mapping function. We try general linear model (GLM), Gaussian mixture model (GMM), artificial neural network (ANN) and deep neural network (DNN) for the problem. Taking the advantage of neural network that its prediction stage can be finished in a very short time (e.g. real-time), we develop a real-time speech driven talking avatar system based on DNN. The input of the system is acoustic speech and the output is articulatory movements (that are synchronized with the input speech) on a three-dimensional avatar. Several experiments are conducted to compare the performance of GLM, GMM, ANN and DNN on a well known acoustic-articulatory English speech corpus MNGU0. Experimental results demonstrate that the proposed acoustic to articulatory mapping method with DNN can achieve the best performance.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-014-2183-z