Loading…

Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?

Recognising continuous emotions and action unit (AU) intensities from face videos, requires a spatial and temporal understanding of expression dynamics. Existing works primarily rely on 2D face appearance features to extract such dynamics. This work focuses on a promising alternative based on parame...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on affective computing 2024-04, Vol.15 (2), p.535-548
Main Authors:	Tellamekala, Mani Kumar, Sumer, Omer, Schuller, Bjorn W., Andre, Elisabeth, Giesbrecht, Timo, Valstar, Michel
Format:	Article
Language:	English
Subjects:	3D morphable models action unit intensity estimation Arousal Computational modeling dimensional affect recognition Emotions Estimation Face recognition Facial expression analysis Feature recognition Gold Shape Shape recognition Solid modeling Task analysis Three-dimensional displays
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recognising continuous emotions and action unit (AU) intensities from face videos, requires a spatial and temporal understanding of expression dynamics. Existing works primarily rely on 2D face appearance features to extract such dynamics. This work focuses on a promising alternative based on parametric 3D face alignment models, which disentangle different factors of variation, including expression-induced shape variations. We aim to understand how expressive 3D face shapes are in estimating valence-arousal and AU intensities compared to the state-of-the-art 2D appearance-based models. We benchmark five recent 3D face models: ExpNet, 3DDFA-V2, RingNet, DECA, and EMOCA. In valence-arousal estimation, expression features of 3D face models consistently surpassed previous works and yielded an average concordance correlation of. 745 and. 574 on SEWA and AVEC 2019 CES corpora, respectively. We also study how 3D face shapes performed on AU intensity estimation on BP4D and DISFA datasets, and report that 3D face features were on par with 2D appearance features in recognising AUs 4, 6, 10, 12, and 25, but not the entire set of AUs. To understand this discrepancy, we conduct a correspondence analysis between valence-arousal and AUs, which points out that accurate prediction of valence-arousal may require the knowledge of only a few AUs.
ISSN:	1949-3045 1949-3045
DOI:	10.1109/TAFFC.2023.3280530