Loading…

Relation Attention for Temporal Action Localization

Temporal action localization aims to accurately localize and recognize all possible action instances from an untrimmed video automatically. Most existing methods perform this task by first generating a set of proposals and then recognizing each independently. However, due to the complex structures a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on multimedia 2020-10, Vol.22 (10), p.2723-2733
Main Authors: Chen, Peihao, Gan, Chuang, Shen, Guangyao, Huang, Wenbing, Zeng, Runhao, Tan, Mingkui
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Temporal action localization aims to accurately localize and recognize all possible action instances from an untrimmed video automatically. Most existing methods perform this task by first generating a set of proposals and then recognizing each independently. However, due to the complex structures and large content variations in action instances, recognizing them individually can be difficult. Fortunately, some proposals often share information regarding one specific action. Such information, which is ignored in existing methods, can be used to boost recognition performance. In this paper, we propose a novel mechanism, called relation attention, to exploit informative relations among proposals based on their appearance or optical flow features. Specifically, we propose a relation attention module to enhance representation power by capturing useful information from other proposals. This module does not change the dimensions of the original input and output and does not rely on any specific proposal generation methods or feature extraction backbone networks. Experimental results show that the proposed relation attention mechanism improves performance significantly on both Thumos14 and ActivityNet1.3 datasets compared to existing architectures. For example, relying on Structured Segment Networks (SSN), the proposed relation attention module helps to increase the mAP from 41.4 to 43.7 on the Thumos14 dataset and outperforms the state-of-the-art results.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2019.2959977