Loading…

Fusing Structure and Appearance Features in Facial Expression Recognition Transformer

Facial expression recognition (FER) methods are fundamental in various human-computer interaction scenarios. Although deep learning-based models have made substantial progress in the FER field, they primarily focus on capturing facial appearance features while neglecting the importance of structure...

Full description

Saved in:
Bibliographic Details
Main Authors: Meng, Siwei, Shi, Wuzhen
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Facial expression recognition (FER) methods are fundamental in various human-computer interaction scenarios. Although deep learning-based models have made substantial progress in the FER field, they primarily focus on capturing facial appearance features while neglecting the importance of structure features, which encompass the overall shape and structure details of the key facial regions. We propose a Structure and Appearance Feature Cross-fusion Transformer (SAFCT) network to leverage structure and appearance features. Specifically, we introduce the gradient-based structure feature to simultaneously capture the overall face shape and local organ variations. For appearance features, we extract both global and landmarks-guided local features to capture global texture and local details. Furthermore, we employ the structure-dominated cross-fusion transformer to integrate these three facial features. Through extensive experimental results, we evaluate the state-of-the-art recognition performance of SAFCT on widely used FER datasets.
ISSN:2379-190X
DOI:10.1109/ICASSP48485.2024.10447031