Loading…
A novel facial expression recognition model based on harnessing complementary features in multi-scale network with attention fusion
This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch...
Saved in:
Published in: | Image and vision computing 2024-09, Vol.149, p.105183, Article 105183 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch multi-scale attention module (MSA), feature complementing module (FCM), and attention fusion and classification module. The MSA module contains multi-scale attention modules in a cascaded fashion in two paths to learn diverse features. The upper and lower paths use left and right multi-scale blocks to extract and aggregate the features at different receptive fields. The attention networks in MSA focus on salient local regions to extract features at granular levels. The FCM uses the correlation between the feature maps in two paths to make the multi-scale attention features complementary to each other. Finally, the complementary features are fused through an attention network to form an informative holistic feature which includes subtle, visually varying regions in similar classes. Hence, complementary and informative features are used in classification to minimize information loss and capture the discriminating finer aspects of facial expression recognition. Experimental evaluation of the proposed model carried out on AffectNet and CK+ datasets achieve accuracies of 64.59% and 98.98%, respectively, outperforming some of the state-of-the-art methods.
•Deeper and wider model extracting diverse features at the granular level.•Feature subsets at the left and right channels contain richer scale information.•The correlation between two parallel paths avoids similar feature learning.•Attention fusion learns subtly varying facial regions.•Multi-feature classification module avoids any loss of information. |
---|---|
ISSN: | 0262-8856 |
DOI: | 10.1016/j.imavis.2024.105183 |