Loading…

Contextualised learning-free three-dimensional body pose estimation from two-dimensional body features in monocular images

In this study, the authors present a learning-free method for inferring kinematically plausible three-dimensional (3D) human body poses contextualised in a predefined 3D world, given a set of 2D body features extracted from monocular images. This contextualisation has the advantage of providing furt...

Full description

Saved in:
Bibliographic Details
Published in:IET computer vision 2016-06, Vol.10 (4), p.299-306
Main Authors: Unzueta, Luis, Aranjuelo, Nerea, Goenetxea, Jon, Rodriguez, Mikel, Linaza, Maria Teresa
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this study, the authors present a learning-free method for inferring kinematically plausible three-dimensional (3D) human body poses contextualised in a predefined 3D world, given a set of 2D body features extracted from monocular images. This contextualisation has the advantage of providing further semantic information about the observed scene. Their method consists of two main steps. Initially, the camera parameters are obtained by adjusting the reference floor of the predefined 3D world to four key-points in the image. Then, the person's body part lengths and pose are estimated by fitting a parametrised multi-body 3D kinematic model to 2D image body features, which can be located by state-of-the-art body part detectors. The adjustment is carried out by a hierarchical optimisation procedure, where the model's scale variations are considered first and then the body part lengths are refined. At each iteration, tentative poses are inferred by a combination of efficient perspective-n-point camera pose estimation and constrained viewpoint-dependent inverse kinematics. Experimental results show that their method obtains good results in terms of accuracy with respect to state-of-the-art alternatives, but without the need of learning 2D/3D mapping models from training data. Their method works efficiently, allowing its integration in video soft sensing systems.
ISSN:1751-9632
1751-9640
1751-9640
DOI:10.1049/iet-cvi.2015.0283