Loading…

A novel facial expression recognition model based on harnessing complementary features in multi-scale network with attention fusion

This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch...

Full description

Saved in:

Bibliographic Details
Published in:	Image and vision computing 2024-09, Vol.149, p.105183, Article 105183
Main Authors:	Ghadai, Chakrapani, Patra, Dipti, Okade, Manish
Format:	Article
Language:	English
Subjects:	Attention network Facial expression recognition Feature complementation Feature fusion Muti-scale
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch multi-scale attention module (MSA), feature complementing module (FCM), and attention fusion and classification module. The MSA module contains multi-scale attention modules in a cascaded fashion in two paths to learn diverse features. The upper and lower paths use left and right multi-scale blocks to extract and aggregate the features at different receptive fields. The attention networks in MSA focus on salient local regions to extract features at granular levels. The FCM uses the correlation between the feature maps in two paths to make the multi-scale attention features complementary to each other. Finally, the complementary features are fused through an attention network to form an informative holistic feature which includes subtle, visually varying regions in similar classes. Hence, complementary and informative features are used in classification to minimize information loss and capture the discriminating finer aspects of facial expression recognition. Experimental evaluation of the proposed model carried out on AffectNet and CK+ datasets achieve accuracies of 64.59% and 98.98%, respectively, outperforming some of the state-of-the-art methods. •Deeper and wider model extracting diverse features at the granular level.•Feature subsets at the left and right channels contain richer scale information.•The correlation between two parallel paths avoids similar feature learning.•Attention fusion learns subtly varying facial regions.•Multi-feature classification module avoids any loss of information.
ISSN:	0262-8856
DOI:	10.1016/j.imavis.2024.105183