Loading…
Synthesizing speech acoustics from head and face motion
This work outlines a quantitative analysis of the relation between speech acoustics and the face and head motions that occur simultaneously [A. V. Barbosa, Ph.D. thesis, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, 2004]. 2-D motion data is obtained by means of a video camera. An al...
Saved in:
Published in: | The Journal of the Acoustical Society of America 2005-04, Vol.117 (4_Supplement), p.2542-2542 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This work outlines a quantitative analysis of the relation between speech acoustics and the face and head motions that occur simultaneously [A. V. Barbosa, Ph.D. thesis, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, 2004]. 2-D motion data is obtained by means of a video camera. An algorithm has been developed for tracking markers on the speaker’s face from the acquired video sequence [A. V. Barbosa, E. Vatikiotis-Bateson, and A. Daffertshofer, in Proceedings of the 8th ICSLP Interspeech 2004, Korea, 2004]. The motion domain is represented by the 2-D marker trajectories, whereas line spectrum pairs (LSP) coefficients and the fundamental frequency F0 are used to represent the speech acoustics domain. Mathematical models are trained to estimate the acoustic parameters (LSPs + F0) from the motion parameters (2-D marker positions). The estimated acoustic parameters are then used to synthesize the acoustic speech signal. Cross-domain analysis for undecomposed (i.e., full head + face) and decomposed (i.e., separated head and face) normalized 2-D motions is performed. Syntheses from each method using intelligibility tests and qualitative comparison of the original and synthesized utterances are being evaluated. |
---|---|
ISSN: | 0001-4966 1520-8524 |
DOI: | 10.1121/1.4788446 |