Loading…

Ordinal information based facial expression intensity estimation for emotional interaction: a novel semi-supervised deep learning approach

Emotional understanding and expression plays a critical role in social interaction. To analyze children’s emotional interaction automatically, this study focuses on developing a novel network architecture and a reliable algorithm for expression intensity estimation to measure children’s facial expre...

Full description

Saved in:
Bibliographic Details
Published in:Computing 2024-04, Vol.106 (4), p.1121-1138
Main Authors: Xu, Ruyi, Han, Jiaxu, Chen, Jingying
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Emotional understanding and expression plays a critical role in social interaction. To analyze children’s emotional interaction automatically, this study focuses on developing a novel network architecture and a reliable algorithm for expression intensity estimation to measure children’s facial expression responses to emotional stimuli. The facial expression intensity variation provides temporal dynamic information of facial behavior, which is critical to interpreting the meaning of expression. In order to avoid laborious manual annotations for expression intensity, existing unsupervised methods attempt to identify relative intensity using ordinal information within a facial expression sequence; however, they fail to estimate absolute intensity accurately. Moreover, appropriate features are needed to represent the continuous appearance changes caused by expression intensity to improve the model’s ability to distinguish subtle differences in expression. This study therefore presents a novel semi-supervised method to estimate expression intensity using salient deep learning features. First, the facial expression is represented by the difference response of the convolutional neural network backbone between the target expression and its responding neutral expression, with the goal of suppressing the effects of expression-unrelated features on expression intensity estimation. Then, the pairwise data constructed with ordinal information is input into a Siamese network with a combined hinge loss that guides learning the relative intensity on unlabeled pairwise frames, the absolute intensity of a few labeled key frames, and the intensity range of most unlabeled frames. The average pearson correlation coefficient, intraclass correlation coefficient, and mean absolute error are 0.7683, 0.7405, and 0.1698 on the extended Cohn-Kanade dataset (CK+), and 0.7804, 0.6684, and 0.1864 on the Binghamton University 4D Facial Expression Dataset using the proposed method, results that are superior to the state of the art. The cross-dataset experiment indicates that the proposed method is promising for the analysis of children’s emotional interactions.
ISSN:0010-485X
1436-5057
DOI:10.1007/s00607-022-01140-y