Loading…
Rethinking the Person Localization for Single-Stage Multi-Person Pose Estimation
Single-stage models for multi-person pose estimation have garnered significant attention due to their streamlined approach in generating person position localization and body structure perception in a single pass. These two parts, however, are processed individually by existing methods, leading to s...
Saved in:
Published in: | IEEE transactions on multimedia 2024, Vol.26, p.1436-1447 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Single-stage models for multi-person pose estimation have garnered significant attention due to their streamlined approach in generating person position localization and body structure perception in a single pass. These two parts, however, are processed individually by existing methods, leading to suboptimal results, e.g., candidates with high confidences for person localization while poor structure estimations. To this end, we propose a simple yet effective approach, namely Structure-guided Person Localization (SPL), jointly leveraging the advantages of the two aspects to solve the multi-person pose estimation problem, with two complementary novelties. First, we propose to incorporate body structure perception to guide person position localization, consequently, we introduce the Structure-guided Center Learning (SCL) to unify the quality of the body structure perception in the displacement map with the confidence of the person existence in the center map, thus achieving more accurate keypoint position localization results even with extreme poses. Second, to facilitate the end-to-end training of SPL, we propose the efficient Agency-based Scale-adaptive Learning (ASL). Specifically, we predict an agency map of the same size as the center map, which focuses on the foreground area and can adaptively adjust the scale size for each central area with the body structure perception confidence. Comprehensive experiments on challenging benchmarks including COCO and CrowdPose clearly verify the superiority of our framework, which achieves new state-of-the-art single-stage multi-person pose estimation results. Specifically, SPL obtains 72.1 AP scores and 69.5 AP scores in COCO test-dev2017 and CrowdPose test set, respectively. |
---|---|
ISSN: | 1520-9210 1941-0077 |
DOI: | 10.1109/TMM.2023.3282139 |