Loading…
Reconstructive network under contrastive graph rewards for video summarization
Video summarization aims to condense video content by extracting pivotal frames or shots. Most existing methods focus on maximizing the intersection between predicted summary and ground truth, overlooking whether users can infer the content of the original video from the summary. Additionally, these...
Saved in:
Published in: | Expert systems with applications 2024-09, Vol.250, p.123860, Article 123860 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Video summarization aims to condense video content by extracting pivotal frames or shots. Most existing methods focus on maximizing the intersection between predicted summary and ground truth, overlooking whether users can infer the content of the original video from the summary. Additionally, these approaches heavily rely on annotated data, posing limitations. Therefore, we propose a reconstructive network under contrastive graph rewards for video summarization, comprising a summary generator and a video reconstructor. The summary generator employs graph contrastive learning to distill essential video information to generate summary. Meanwhile, the video reconstructor employs reinforcement learning within an unsupervised training framework to optimize the summary generator, addressing the shortage of annotated video data in summarization tasks. Leveraging reconstruction loss, our approach ensures that predicted summary encapsulate main video content and inter-shots dependencies. Notably, we innovatively devise a mutual information maximization reconstruction reward function to preserve shared information between the summary and the original video, facilitating users in comprehending the original video content. We conduct massive experiments on the TVSum and SumMe datasets, and our network achieved F1 scores of 58.8% and 48.0%, respectively. Experimental results validate the superiority of our method over both state-of-the-art unsupervised and many supervised video summarization techniques.
[Display omitted]
•Introduces a reconstructive network under contrastive graph rewards.•Utilizing graph contrastive learning on node and edge attributes.•Design reconstruction reward function for maximizing mutual information. |
---|---|
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2024.123860 |