Loading…
Comparing alternatives for capturing dynamic information in Bag-of-Visual-Features approaches applied to human actions recognition
Bag-of-Visual-Features (BoVF) representations have achieved a great success when used for object recognition, mainly because of their robustness to several kinds of variations and occlusion. Recently, a number of BoVF approaches has been proposed also for recognition of human actions from videos. On...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Bag-of-Visual-Features (BoVF) representations have achieved a great success when used for object recognition, mainly because of their robustness to several kinds of variations and occlusion. Recently, a number of BoVF approaches has been proposed also for recognition of human actions from videos. One important issue that arises when using BoVF for videos is how to take dynamic information into account, and most proposals rely on 3D extensions of 2D visual descriptors for this. However, we envision alternative approaches based on 2D descriptors applied to the spatio-temporal video planes, instead of to the traditionally explored by previous work. Thus, in this paper, we address the following question: what is the cost-effectiveness of a BoVF approach built from such 2D descriptors when compared to one based on the state-of-the-art 3D Spatio-Temporal Interest Points (STIPs) descriptor? We evaluate the recognition rate and time complexity of alternative 2D descriptors applied to different sets of spatio-temporal planes, and the state-of-the-art STIPs. Experimental results show that, with proper settings, 2D descriptors can yield the same recognition results as those provided by STIP, but at a significantly higher time complexity. |
---|---|
DOI: | 10.1109/MMSP.2009.5293303 |