Loading…
Video Summarization using Denoising Diffusion Probabilistic Model
Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsist...
Saved in:
Published in: | arXiv.org 2024-12 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Shang, Zirui Zhu, Yubo Li, Hongxi Yang, Shuo Wu, Xinxiao |
description | Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability. Moreover, to facilitate training DDPM with limited data, we employ an unsupervised video summarization model to implement the earlier denoising process. Extensive experiments on various datasets (TVSum, SumMe, and FPVSum) demonstrate the effectiveness of our method. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3144197687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3144197687</sourcerecordid><originalsourceid>FETCH-proquest_journals_31441976873</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwDMtMSc1XCC7NzU0syqxKLMnMz1MoLc7MS1dwSc3Lz4SwMtPSgGJAmYCi_KTEpMyczOKSzGQF3_yU1BweBta0xJziVF4ozc2g7OYa4uyhW1CUX1iaWlwSn5VfWpQHlIo3NjQxMbQ0N7MwNyZOFQC5qzn5</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3144197687</pqid></control><display><type>article</type><title>Video Summarization using Denoising Diffusion Probabilistic Model</title><source>Publicly Available Content (ProQuest)</source><creator>Shang, Zirui ; Zhu, Yubo ; Li, Hongxi ; Yang, Shuo ; Wu, Xinxiao</creator><creatorcontrib>Shang, Zirui ; Zhu, Yubo ; Li, Hongxi ; Yang, Shuo ; Wu, Xinxiao</creatorcontrib><description>Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability. Moreover, to facilitate training DDPM with limited data, we employ an unsupervised video summarization model to implement the earlier denoising process. Extensive experiments on various datasets (TVSum, SumMe, and FPVSum) demonstrate the effectiveness of our method.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Annotations ; Noise generation ; Noise prediction ; Noise reduction ; Probabilistic models ; Probability distribution ; Redundancy ; Statistical analysis ; Summaries ; Video data ; Visual discrimination</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3144197687?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25752,37011,44589</link.rule.ids></links><search><creatorcontrib>Shang, Zirui</creatorcontrib><creatorcontrib>Zhu, Yubo</creatorcontrib><creatorcontrib>Li, Hongxi</creatorcontrib><creatorcontrib>Yang, Shuo</creatorcontrib><creatorcontrib>Wu, Xinxiao</creatorcontrib><title>Video Summarization using Denoising Diffusion Probabilistic Model</title><title>arXiv.org</title><description>Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability. Moreover, to facilitate training DDPM with limited data, we employ an unsupervised video summarization model to implement the earlier denoising process. Extensive experiments on various datasets (TVSum, SumMe, and FPVSum) demonstrate the effectiveness of our method.</description><subject>Annotations</subject><subject>Noise generation</subject><subject>Noise prediction</subject><subject>Noise reduction</subject><subject>Probabilistic models</subject><subject>Probability distribution</subject><subject>Redundancy</subject><subject>Statistical analysis</subject><subject>Summaries</subject><subject>Video data</subject><subject>Visual discrimination</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwDMtMSc1XCC7NzU0syqxKLMnMz1MoLc7MS1dwSc3Lz4SwMtPSgGJAmYCi_KTEpMyczOKSzGQF3_yU1BweBta0xJziVF4ozc2g7OYa4uyhW1CUX1iaWlwSn5VfWpQHlIo3NjQxMbQ0N7MwNyZOFQC5qzn5</recordid><startdate>20241212</startdate><enddate>20241212</enddate><creator>Shang, Zirui</creator><creator>Zhu, Yubo</creator><creator>Li, Hongxi</creator><creator>Yang, Shuo</creator><creator>Wu, Xinxiao</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241212</creationdate><title>Video Summarization using Denoising Diffusion Probabilistic Model</title><author>Shang, Zirui ; Zhu, Yubo ; Li, Hongxi ; Yang, Shuo ; Wu, Xinxiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31441976873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Noise generation</topic><topic>Noise prediction</topic><topic>Noise reduction</topic><topic>Probabilistic models</topic><topic>Probability distribution</topic><topic>Redundancy</topic><topic>Statistical analysis</topic><topic>Summaries</topic><topic>Video data</topic><topic>Visual discrimination</topic><toplevel>online_resources</toplevel><creatorcontrib>Shang, Zirui</creatorcontrib><creatorcontrib>Zhu, Yubo</creatorcontrib><creatorcontrib>Li, Hongxi</creatorcontrib><creatorcontrib>Yang, Shuo</creatorcontrib><creatorcontrib>Wu, Xinxiao</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shang, Zirui</au><au>Zhu, Yubo</au><au>Li, Hongxi</au><au>Yang, Shuo</au><au>Wu, Xinxiao</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Video Summarization using Denoising Diffusion Probabilistic Model</atitle><jtitle>arXiv.org</jtitle><date>2024-12-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability. Moreover, to facilitate training DDPM with limited data, we employ an unsupervised video summarization model to implement the earlier denoising process. Extensive experiments on various datasets (TVSum, SumMe, and FPVSum) demonstrate the effectiveness of our method.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-12 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3144197687 |
source | Publicly Available Content (ProQuest) |
subjects | Annotations Noise generation Noise prediction Noise reduction Probabilistic models Probability distribution Redundancy Statistical analysis Summaries Video data Visual discrimination |
title | Video Summarization using Denoising Diffusion Probabilistic Model |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T13%3A19%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Video%20Summarization%20using%20Denoising%20Diffusion%20Probabilistic%20Model&rft.jtitle=arXiv.org&rft.au=Shang,%20Zirui&rft.date=2024-12-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3144197687%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31441976873%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3144197687&rft_id=info:pmid/&rfr_iscdi=true |