Loading…
Spatio-Temporal Transformer for Online Video Understanding
Leading methods in the field of online video understanding try to extract useful information from the spatial and temporal dimensions of an input video. But they are suffering from two problems: (1) These methods can only extract local video information, and cannot relate to the important features o...
Saved in:
Published in: | Journal of physics. Conference series 2022-01, Vol.2171 (1), p.12020 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Leading methods in the field of online video understanding try to extract useful information from the spatial and temporal dimensions of an input video. But they are suffering from two problems: (1) These methods can only extract local video information, and cannot relate to the important features of the temporal context in the video. (2) Although some methods can quickly process the information of each frame in the video, the processing efficiency of the whole video is not good, so this type of method cannot be applied to online video understanding tasks. This article introduces a Transformer-based network, which considers spatial and temporal content, and can quickly process each video at the same time. Our approach can efficiently handle up to 170 videos with hundreds of frames per second for action classification. Our method achieve 10 to 90 times faster than existing methods on the action classification datasets. |
---|---|
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/2171/1/012020 |