Loading…
Fusing Structure and Appearance Features in Facial Expression Recognition Transformer
Facial expression recognition (FER) methods are fundamental in various human-computer interaction scenarios. Although deep learning-based models have made substantial progress in the FER field, they primarily focus on capturing facial appearance features while neglecting the importance of structure...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Facial expression recognition (FER) methods are fundamental in various human-computer interaction scenarios. Although deep learning-based models have made substantial progress in the FER field, they primarily focus on capturing facial appearance features while neglecting the importance of structure features, which encompass the overall shape and structure details of the key facial regions. We propose a Structure and Appearance Feature Cross-fusion Transformer (SAFCT) network to leverage structure and appearance features. Specifically, we introduce the gradient-based structure feature to simultaneously capture the overall face shape and local organ variations. For appearance features, we extract both global and landmarks-guided local features to capture global texture and local details. Furthermore, we employ the structure-dominated cross-fusion transformer to integrate these three facial features. Through extensive experimental results, we evaluate the state-of-the-art recognition performance of SAFCT on widely used FER datasets. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP48485.2024.10447031 |