Loading…

A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization

Video summarization helps users obtain quick comprehension of video content. Recently, some studies have utilized local features to represent each video frame and formulate video summarization as a coverage problem of local features. However, the importance of individual local features has not been...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on multimedia 2014-10, Vol.16 (6), p.1497-1509
Main Authors: Shiyang Lu, Zhiyong Wang, Tao Mei, Genliang Guan, Feng, David Dagan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Video summarization helps users obtain quick comprehension of video content. Recently, some studies have utilized local features to represent each video frame and formulate video summarization as a coverage problem of local features. However, the importance of individual local features has not been exploited. In this paper, we propose a novel Bag-of-Importance (BoI) model for static video summarization by identifying the frames with important local features as keyframes, which is one of the first studies formulating video summarization at local feature level, instead of at global feature level. That is, by representing each frame with local features, a video is characterized with a bag of local features weighted with individual importance scores and the frames with more important local features are more representative, where the representativeness of each frame is the aggregation of the weighted importance of the local features contained in the frame. In addition, we propose to learn a transformation from a raw local feature to a more powerful sparse nonlinear representation for deriving the importance score of each local feature, rather than directly utilize the hand-crafted visual features like most of the existing approaches. Specifically, we first employ locality-constrained linear coding (LCC) to project each local feature into a sparse transformed space. LCC is able to take advantage of the manifold geometric structure of the high dimensional feature space and form the manifold of the low dimensional transformed space with the coordinates of a set of anchor points. Then we calculate the l2 norm of each anchor point as the importance score of each local feature which is projected to the anchor point. Finally, the distribution of the importance scores of all the local features in a video is obtained as the BoI representation of the video. We further differentiate the importance of local features with a spatial weighting template by taking the perceptual difference among spatial regions of a frame into account. As a result, our proposed video summarization approach is able to exploit both the inter-frame and intra-frame properties of feature representations and identify keyframes capturing both the dominant content and discriminative details within a video. Experimental results on three video datasets across various genres demonstrate that the proposed approach clearly outperforms several state-of-the-art methods.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2014.2319778