Loading…

Multistage Polymerization Network for Multiperson Pose Estimation

Multiperson pose estimation is an important and complex problem in computer vision. It is regarded as the problem of human skeleton joint detection and solved by the joint heat map regression network in recent years. The key of achieving accurate pose estimation is to learn robust and discriminative...

Full description

Saved in:
Bibliographic Details
Published in:Journal of sensors 2021-12, Vol.2021 (1)
Main Authors: Bai, Yu-Fei, Zhang, Hong-Bo, Lei, Qing, Du, Ji-Xiang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multiperson pose estimation is an important and complex problem in computer vision. It is regarded as the problem of human skeleton joint detection and solved by the joint heat map regression network in recent years. The key of achieving accurate pose estimation is to learn robust and discriminative feature maps. Although the current methods have made significant progress through interlayer fusion and intralevel fusion of feature maps, few works pay attention to the combination of the two methods. In this paper, we propose a multistage polymerization network (MPN) for multiperson pose estimation. The MPN continuously learns rich underlying spatial information by fusing features within the layers. The MPN also adds hierarchical connections between feature maps at the same resolution for interlayer fusion, so as to reuse low-level spatial information and refine high-level semantic information to obtain accurate keypoint representation. In addition, we observe a lack of connection between the output low-level information and the high-level information. To solve this problem, an effective shuffled attention mechanism (SAM) is proposed. The shuffle aims to promote the cross-channel information exchange between pyramid feature maps, while attention makes a trade-off between the low-level and high-level representations of the output features. As a result, the relationship between the space and the channel of the feature map is further enhanced. Evaluation of the proposed method is carried out on public datasets, and experimental results show that our method has better performance than current methods.
ISSN:1687-725X
1687-7268
DOI:10.1155/2021/1484218