Loading…

Speaker independence in automated lip-sync for audio–video communication

By analyzing the absolute value of the Fourier transform of a speaker's voice signal we can predict the position of the mouth for English vowel sounds. This is without the use of text, speech recognition or mechanical or other sensing devices attached to the speaker's mouth. This capabilit...

Full description

Saved in:
Bibliographic Details
Published in:Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 1998-11, Vol.30 (20), p.1975-1980
Main Authors: McAllister, David F., Rodman, Robert D., Bitzer, Donald L., Freeman, Andrew S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:By analyzing the absolute value of the Fourier transform of a speaker's voice signal we can predict the position of the mouth for English vowel sounds. This is without the use of text, speech recognition or mechanical or other sensing devices attached to the speaker's mouth. This capability can reduce the time required for mouth animation considerably. We expect it to be competitive eventually with the speech/text driven solutions which are becoming popular. Our technique would require much less interaction from the user and no knowledge of phonetic spelling. We discuss the problems of producing an algorithm that is speaker independent. The goal is to avoid having to measure mouth movements off video for each speaker's training sounds. We have discovered that eliminating variation due to pitch yields moments which are mouth shape dependent but not speaker dependent. This implies that careful construction of predictor surfaces can produce speaker independent prediction of mouth motion for English vowels.
ISSN:0169-7552
1389-1286
1872-7069
DOI:10.1016/S0169-7552(98)00216-5