Loading…

Reconstructive network under contrastive graph rewards for video summarization

Video summarization aims to condense video content by extracting pivotal frames or shots. Most existing methods focus on maximizing the intersection between predicted summary and ground truth, overlooking whether users can infer the content of the original video from the summary. Additionally, these...

Full description

Saved in:

Bibliographic Details
Published in:	Expert systems with applications 2024-09, Vol.250, p.123860, Article 123860
Main Authors:	Wu, Guangli, Song, Shanshan, Wang, Xingyue, Zhang, Jing
Format:	Article
Language:	English
Subjects:	Graph contrastive learning Mutual information Reinforcement learning Video summarization
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Video summarization aims to condense video content by extracting pivotal frames or shots. Most existing methods focus on maximizing the intersection between predicted summary and ground truth, overlooking whether users can infer the content of the original video from the summary. Additionally, these approaches heavily rely on annotated data, posing limitations. Therefore, we propose a reconstructive network under contrastive graph rewards for video summarization, comprising a summary generator and a video reconstructor. The summary generator employs graph contrastive learning to distill essential video information to generate summary. Meanwhile, the video reconstructor employs reinforcement learning within an unsupervised training framework to optimize the summary generator, addressing the shortage of annotated video data in summarization tasks. Leveraging reconstruction loss, our approach ensures that predicted summary encapsulate main video content and inter-shots dependencies. Notably, we innovatively devise a mutual information maximization reconstruction reward function to preserve shared information between the summary and the original video, facilitating users in comprehending the original video content. We conduct massive experiments on the TVSum and SumMe datasets, and our network achieved F1 scores of 58.8% and 48.0%, respectively. Experimental results validate the superiority of our method over both state-of-the-art unsupervised and many supervised video summarization techniques. [Display omitted] •Introduces a reconstructive network under contrastive graph rewards.•Utilizing graph contrastive learning on node and edge attributes.•Design reconstruction reward function for maximizing mutual information.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2024.123860