Loading…

Comparing alternatives for capturing dynamic information in Bag-of-Visual-Features approaches applied to human actions recognition

Bag-of-Visual-Features (BoVF) representations have achieved a great success when used for object recognition, mainly because of their robustness to several kinds of variations and occlusion. Recently, a number of BoVF approaches has been proposed also for recognition of human actions from videos. On...

Full description

Saved in:
Bibliographic Details
Main Authors: Lopes, A.P.B., Oliveira, R.S., de Almeida, J.M., de A Araujo, A.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Bag-of-Visual-Features (BoVF) representations have achieved a great success when used for object recognition, mainly because of their robustness to several kinds of variations and occlusion. Recently, a number of BoVF approaches has been proposed also for recognition of human actions from videos. One important issue that arises when using BoVF for videos is how to take dynamic information into account, and most proposals rely on 3D extensions of 2D visual descriptors for this. However, we envision alternative approaches based on 2D descriptors applied to the spatio-temporal video planes, instead of to the traditionally explored by previous work. Thus, in this paper, we address the following question: what is the cost-effectiveness of a BoVF approach built from such 2D descriptors when compared to one based on the state-of-the-art 3D Spatio-Temporal Interest Points (STIPs) descriptor? We evaluate the recognition rate and time complexity of alternative 2D descriptors applied to different sets of spatio-temporal planes, and the state-of-the-art STIPs. Experimental results show that, with proper settings, 2D descriptors can yield the same recognition results as those provided by STIP, but at a significantly higher time complexity.
DOI:10.1109/MMSP.2009.5293303