Loading…

Cofopose: Conditional 2D Pose Estimation with Transformers

Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually...

Full description

Saved in:

Bibliographic Details
Published in:	Sensors (Basel, Switzerland) Switzerland), 2022-09, Vol.22 (18), p.6821
Main Authors:	Aidoo, Evans, Wang, Xun, Liu, Zhenguang, Tenagyei, Edwin Kwadwo, Owusu-Agyemang, Kwabena, Kodjiku, Seth Larweh, Ejianya, Victor Nonso, Aggrey, Esther Stacy E. B.
Format:	Article
Language:	English
Subjects:	Artificial intelligence Blurring Computer vision conditional DETR convolutional neural network (CNN) Datasets detection DETR Encoders-Decoders human pose estimation Image processing Machine vision Neural networks Pose estimation Queries Technology application Testing
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s22186821