Loading…
Video summarization via spatio-temporal deep architecture
Video summarization has unprecedented importance to help us overview current ever-growing amount of video collections. In this paper, we propose a novel dynamic video summarization model based on deep learning architecture. We are the first to solve the imbalanced class distribution problem in video...
Saved in:
Published in: | Neurocomputing (Amsterdam) 2019-03, Vol.332, p.224-235 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Video summarization has unprecedented importance to help us overview current ever-growing amount of video collections. In this paper, we propose a novel dynamic video summarization model based on deep learning architecture. We are the first to solve the imbalanced class distribution problem in video summarization. The over-sampling algorithm is used to balance the class distribution on training data. The novel two-stream deep architecture with the cost-sensitive learning is proposed to handle the class imbalance problem in feature learning. In the spatial stream, RGB images are used to represent the appearance of video frames, and in the temporal stream, multi-frame motion vectors with deep learning framework is firstly introduced to represent and extract temporal information of the input video. The proposed method is evaluated on two standard video summarization datasets and a standard emotional dataset. Empirical validations for video summarization demonstrate that our model achieves performance improvement over the existing and state-of-the-art methods. Moreover, the proposed method is able to highlight the video content with the active level of arousal in affective computing task. In addition, the proposed frame-based model has another advantage. It can automatically preserve the connection between consecutive frames. Although the summary is constructed based on the frame level, the final summary is comprised of informative and continuous segments instead of individual separate frames. |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2018.12.040 |