Loading…

Acoustic to articulatory mapping with deep neural network

Synthetic talking avatar has been demonstrated to be very useful in human-computer interactions. In this paper, we discuss the problem of acoustic to articulatory mapping and explore different kinds of models to describe the mapping function. We try general linear model (GLM), Gaussian mixture model...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2015-11, Vol.74 (22), p.9889-9907
Main Authors: Wu, Zhiyong, Zhao, Kai, Wu, Xixin, Lan, Xinyu, Meng, Helen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Synthetic talking avatar has been demonstrated to be very useful in human-computer interactions. In this paper, we discuss the problem of acoustic to articulatory mapping and explore different kinds of models to describe the mapping function. We try general linear model (GLM), Gaussian mixture model (GMM), artificial neural network (ANN) and deep neural network (DNN) for the problem. Taking the advantage of neural network that its prediction stage can be finished in a very short time (e.g. real-time), we develop a real-time speech driven talking avatar system based on DNN. The input of the system is acoustic speech and the output is articulatory movements (that are synchronized with the input speech) on a three-dimensional avatar. Several experiments are conducted to compare the performance of GLM, GMM, ANN and DNN on a well known acoustic-articulatory English speech corpus MNGU0. Experimental results demonstrate that the proposed acoustic to articulatory mapping method with DNN can achieve the best performance.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-014-2183-z