Loading…

Skeleton-based structured early activity prediction

To communicate with people, robots and vision-based interactive systems often need to understand human activities in advance before the activity is performed completely. This early prediction of the activities will help them take proper near future steps to fulfill a realistic interactive session wi...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia tools and applications 2021-06, Vol.80 (15), p.23023-23049
Main Authors:	Arzani, Mohammad M., Fathy, Mahmood, Azirani, Ahmad A., Adeli, Ehsan
Format:	Article
Language:	English
Subjects:	Clustering Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Interactive systems Multimedia Information Systems Special Purpose and Application-Based Systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	To communicate with people, robots and vision-based interactive systems often need to understand human activities in advance before the activity is performed completely. This early prediction of the activities will help them take proper near future steps to fulfill a realistic interactive session with humans. However, predicting activities in advance is a very challenging task, because some activities are simple while others are complex and comprised of several smaller atomic sub-activities. In this paper, we propose a method capable of early prediction of simple and complex human activities by formulating it as a structured prediction task using probabilistic graphical models (PGM). We use skeletons captured from low-cost depth sensors as high-level descriptions of the human body. Using 3D skeletons, our method will be robust to the environmental factors. Our proposed model is a fully observed PGM coupled with a clustering scheme to remove the dependency of our model to the number-of-middle-states hyperparameter. We test our method on three popular datasets: CAD-60, UT-Kinect, and Florence 3D and obtain accuracies of 97.6% , 100% and 96.11%, respectively. These datasets cover both simple and complex activities. When only half of the clip is observed, we achieve 93.33% and 96.9% accuracy on CAD-60 and UT-Kinect datasets, respectively.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-020-08875-w