Loading…
Speaker independence in automated lip-sync for audio–video communication
By analyzing the absolute value of the Fourier transform of a speaker's voice signal we can predict the position of the mouth for English vowel sounds. This is without the use of text, speech recognition or mechanical or other sensing devices attached to the speaker's mouth. This capabilit...
Saved in:
Published in: | Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 1998-11, Vol.30 (20), p.1975-1980 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | By analyzing the absolute value of the Fourier transform of a speaker's voice signal we can predict the position of the mouth for English vowel sounds. This is without the use of text, speech recognition or mechanical or other sensing devices attached to the speaker's mouth. This capability can reduce the time required for mouth animation considerably. We expect it to be competitive eventually with the speech/text driven solutions which are becoming popular. Our technique would require much less interaction from the user and no knowledge of phonetic spelling. We discuss the problems of producing an algorithm that is speaker independent. The goal is to avoid having to measure mouth movements off video for each speaker's training sounds. We have discovered that eliminating variation due to pitch yields moments which are mouth shape dependent but not speaker dependent. This implies that careful construction of predictor surfaces can produce speaker independent prediction of mouth motion for English vowels. |
---|---|
ISSN: | 0169-7552 1389-1286 1872-7069 |
DOI: | 10.1016/S0169-7552(98)00216-5 |