Loading…

Talking Face Generation for Impression Conversion Considering Speech Semantics

This study investigates the talking face generation method to convert a speaker's video to give a target impression, such as "favorable" or "considerate". Such an impression conversion method needs to consider the input speech semantics because they affect the impression of...

Full description

Saved in:
Bibliographic Details
Main Authors: Mizuno, Saki, Hojo, Nobukatsu, Shinoda, Kazutoshi, Suzuki, Keita, Ihori, Mana, Sato, Hiroshi, Tanaka, Tomohiro, Kawata, Naotaka, Kobashikawa, Satoshi, Masumura, Ryo
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study investigates the talking face generation method to convert a speaker's video to give a target impression, such as "favorable" or "considerate". Such an impression conversion method needs to consider the input speech semantics because they affect the impression of a speaker's video along with the facial expression. Conventional emotional talking face generation methods utilize speech information to synchronize the lip and speech of the output video. However, they cannot consider speech semantics because the speech representations contain only phonetic information. To solve this problem, we propose a facial expression conversion model that uses a semantic vector obtained from BERT embeddings of speech recognition results of input speech. We first constructed an audio-visual dataset with impression labels assigned to each utterance. The evaluation results based on the dataset showed that the proposed method could improve the estimation accuracy of the facial expressions of the target video.
ISSN:2379-190X
DOI:10.1109/ICASSP48485.2024.10446947