Loading…

Adaptively Enhancing Facial Expression Crucial Regions via a Local Non-local Joint Network

Facial expression recognition (FER) is still challenging due to the small interclass discrepancy in facial expression data. In view of the significance of facial crucial regions for FER, many existing studies utilize the prior information from some annotated crucial points to improve the performance...

Full description

Saved in:
Bibliographic Details
Published in:International journal of automation and computing 2024-04, Vol.21 (2), p.331-348
Main Authors: Shi, Guanghui, Mao, Shasha, Gou, Shuiping, Yan, Dandan, Jiao, Licheng, Xiong, Lin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Facial expression recognition (FER) is still challenging due to the small interclass discrepancy in facial expression data. In view of the significance of facial crucial regions for FER, many existing studies utilize the prior information from some annotated crucial points to improve the performance of FER. However, it is complicated and time-consuming to manually annotate facial crucial points, especially for vast wild expression images. Based on this, a local non-local joint network is proposed to adaptively enhance the facial crucial regions in feature learning of FER in this paper. In the proposed method, two parts are constructed based on facial local and non-local information, where an ensemble of multiple local networks is proposed to extract local features corresponding to multiple facial local regions and a non-local attention network is addressed to explore the significance of each local region. In particular, the attention weights obtained by the non-local network are fed into the local part to achieve interactive feedback between the facial global and local information. Interestingly, the non-local weights corresponding to local regions are gradually updated and higher weights are given to more crucial regions. Moreover, U-Net is employed to extract the integrated features of deep semantic information and low hierarchical detail information of expression images. Finally, experimental results illustrate that the proposed method achieves more competitive performance than several state-of-the-art methods on five benchmark datasets.
ISSN:2731-538X
1476-8186
2731-5398
1751-8520
DOI:10.1007/s11633-023-1417-9