Loading…

3D Pose Tracking With Multitemplate Warping and SIFT Correspondences

Template warping is a popular technique in vision-based 3D motion tracking and 3D pose estimation due to its flexibility of being applicable to monocular video sequences. However, the method suffers from two major limitations that hamper its successful use in practice. First, it requires the camera...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2016-11, Vol.26 (11), p.2043-2055
Main Authors: Shu Chen, Luming Liang, Wenzhang Liang, Foroosh, Hassan
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Template warping is a popular technique in vision-based 3D motion tracking and 3D pose estimation due to its flexibility of being applicable to monocular video sequences. However, the method suffers from two major limitations that hamper its successful use in practice. First, it requires the camera to be calibrated prior to applying the method. Second, it may fail to provide good results if the inter-frame displacements are too large. To overcome the first problem, we propose to estimate the unknown focal length of the camera from several initial frames by an iterative optimization process. To alleviate the second problem, we propose a tracking method based on combining complementary information provided by dense optical flow and tracked scale-invariant feature transform (SIFT) features. While optical flow is good for small displacements and provides accurate local information, tracked SIFT features are better at handling larger displacements or global transformations. To combine these two pieces of complementary information, we introduce a forgetting factor to bootstrap the 3D pose estimates provided by SIFT features, and refine the final results using optical flow. Experiments are performed on three public databases, i.e., the Biwi Head Pose dataset, the BU dataset, and the McGill Faces datasets. The results illustrate that the proposed solution provides more accurate results than baseline methods that rely solely on either template warping or SIFT features. In addition, the approach can be applied in a larger variety of scenarios, due to circumventing the need for camera calibration, thus providing a more flexible solution to the problem than existing methods.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2015.2452782