Loading…

Dreamix: Video Diffusion Models are General Video Editors

Text-driven image and video diffusion models have recently achieved unprecedented generation realism. While diffusion models have been successfully applied for image editing, very few works have done so for video editing. We present the first diffusion-based method that is able to perform text-based...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-02
Main Authors:	Molad, Eyal, Horwitz, Eliahu, Valevski, Dani, Acha, Alex Rav, Matias, Yossi, Pritch, Yael, Leviathan, Yaniv, Hoshen, Yedid
Format:	Article
Language:	English
Subjects:	Accuracy Animation Diffusion Editing High resolution Image processing
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Molad, Eyal Horwitz, Eliahu Valevski, Dani Acha, Alex Rav Matias, Yossi Pritch, Yael Leviathan, Yaniv Hoshen, Yedid
description	Text-driven image and video diffusion models have recently achieved unprecedented generation realism. While diffusion models have been successfully applied for image editing, very few works have done so for video editing. We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos. Our approach uses a video diffusion model to combine, at inference time, the low-resolution spatio-temporal information from the original video with new, high resolution information that it synthesized to align with the guiding text prompt. As obtaining high-fidelity to the original video requires retaining some of its high-resolution information, we add a preliminary stage of finetuning the model on the original video, significantly boosting fidelity. We propose to improve motion editability by a new, mixed objective that jointly finetunes with full temporal attention and with temporal attention masking. We further introduce a new framework for image animation. We first transform the image into a coarse video by simple image processing operations such as replication and perspective geometric projections, and then use our general video editor to animate it. As a further application, we can use our method for subject-driven video generation. Extensive qualitative and numerical experiments showcase the remarkable editing ability of our method and establish its superior performance compared to baseline methods.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2772191898</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2772191898</sourcerecordid><originalsourceid>FETCH-proquest_journals_27721918983</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwdClKTczNrLBSCMtMSc1XcMlMSystzszPU_DNT0nNKVZILEpVcE_NSy1KzIEqcU3JLMkvKuZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjc3MjQ0tDC0sLY-JUAQAsHDXX</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2772191898</pqid></control><display><type>article</type><title>Dreamix: Video Diffusion Models are General Video Editors</title><source>Publicly Available Content Database</source><creator>Molad, Eyal ; Horwitz, Eliahu ; Valevski, Dani ; Acha, Alex Rav ; Matias, Yossi ; Pritch, Yael ; Leviathan, Yaniv ; Hoshen, Yedid</creator><creatorcontrib>Molad, Eyal ; Horwitz, Eliahu ; Valevski, Dani ; Acha, Alex Rav ; Matias, Yossi ; Pritch, Yael ; Leviathan, Yaniv ; Hoshen, Yedid</creatorcontrib><description>Text-driven image and video diffusion models have recently achieved unprecedented generation realism. While diffusion models have been successfully applied for image editing, very few works have done so for video editing. We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos. Our approach uses a video diffusion model to combine, at inference time, the low-resolution spatio-temporal information from the original video with new, high resolution information that it synthesized to align with the guiding text prompt. As obtaining high-fidelity to the original video requires retaining some of its high-resolution information, we add a preliminary stage of finetuning the model on the original video, significantly boosting fidelity. We propose to improve motion editability by a new, mixed objective that jointly finetunes with full temporal attention and with temporal attention masking. We further introduce a new framework for image animation. We first transform the image into a coarse video by simple image processing operations such as replication and perspective geometric projections, and then use our general video editor to animate it. As a further application, we can use our method for subject-driven video generation. Extensive qualitative and numerical experiments showcase the remarkable editing ability of our method and establish its superior performance compared to baseline methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Accuracy ; Animation ; Diffusion ; Editing ; High resolution ; Image processing</subject><ispartof>arXiv.org, 2023-02</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2772191898?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Molad, Eyal</creatorcontrib><creatorcontrib>Horwitz, Eliahu</creatorcontrib><creatorcontrib>Valevski, Dani</creatorcontrib><creatorcontrib>Acha, Alex Rav</creatorcontrib><creatorcontrib>Matias, Yossi</creatorcontrib><creatorcontrib>Pritch, Yael</creatorcontrib><creatorcontrib>Leviathan, Yaniv</creatorcontrib><creatorcontrib>Hoshen, Yedid</creatorcontrib><title>Dreamix: Video Diffusion Models are General Video Editors</title><title>arXiv.org</title><description>Text-driven image and video diffusion models have recently achieved unprecedented generation realism. While diffusion models have been successfully applied for image editing, very few works have done so for video editing. We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos. Our approach uses a video diffusion model to combine, at inference time, the low-resolution spatio-temporal information from the original video with new, high resolution information that it synthesized to align with the guiding text prompt. As obtaining high-fidelity to the original video requires retaining some of its high-resolution information, we add a preliminary stage of finetuning the model on the original video, significantly boosting fidelity. We propose to improve motion editability by a new, mixed objective that jointly finetunes with full temporal attention and with temporal attention masking. We further introduce a new framework for image animation. We first transform the image into a coarse video by simple image processing operations such as replication and perspective geometric projections, and then use our general video editor to animate it. As a further application, we can use our method for subject-driven video generation. Extensive qualitative and numerical experiments showcase the remarkable editing ability of our method and establish its superior performance compared to baseline methods.</description><subject>Accuracy</subject><subject>Animation</subject><subject>Diffusion</subject><subject>Editing</subject><subject>High resolution</subject><subject>Image processing</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwdClKTczNrLBSCMtMSc1XcMlMSystzszPU_DNT0nNKVZILEpVcE_NSy1KzIEqcU3JLMkvKuZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjc3MjQ0tDC0sLY-JUAQAsHDXX</recordid><startdate>20230202</startdate><enddate>20230202</enddate><creator>Molad, Eyal</creator><creator>Horwitz, Eliahu</creator><creator>Valevski, Dani</creator><creator>Acha, Alex Rav</creator><creator>Matias, Yossi</creator><creator>Pritch, Yael</creator><creator>Leviathan, Yaniv</creator><creator>Hoshen, Yedid</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230202</creationdate><title>Dreamix: Video Diffusion Models are General Video Editors</title><author>Molad, Eyal ; Horwitz, Eliahu ; Valevski, Dani ; Acha, Alex Rav ; Matias, Yossi ; Pritch, Yael ; Leviathan, Yaniv ; Hoshen, Yedid</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27721918983</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Animation</topic><topic>Diffusion</topic><topic>Editing</topic><topic>High resolution</topic><topic>Image processing</topic><toplevel>online_resources</toplevel><creatorcontrib>Molad, Eyal</creatorcontrib><creatorcontrib>Horwitz, Eliahu</creatorcontrib><creatorcontrib>Valevski, Dani</creatorcontrib><creatorcontrib>Acha, Alex Rav</creatorcontrib><creatorcontrib>Matias, Yossi</creatorcontrib><creatorcontrib>Pritch, Yael</creatorcontrib><creatorcontrib>Leviathan, Yaniv</creatorcontrib><creatorcontrib>Hoshen, Yedid</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Molad, Eyal</au><au>Horwitz, Eliahu</au><au>Valevski, Dani</au><au>Acha, Alex Rav</au><au>Matias, Yossi</au><au>Pritch, Yael</au><au>Leviathan, Yaniv</au><au>Hoshen, Yedid</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Dreamix: Video Diffusion Models are General Video Editors</atitle><jtitle>arXiv.org</jtitle><date>2023-02-02</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Text-driven image and video diffusion models have recently achieved unprecedented generation realism. While diffusion models have been successfully applied for image editing, very few works have done so for video editing. We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos. Our approach uses a video diffusion model to combine, at inference time, the low-resolution spatio-temporal information from the original video with new, high resolution information that it synthesized to align with the guiding text prompt. As obtaining high-fidelity to the original video requires retaining some of its high-resolution information, we add a preliminary stage of finetuning the model on the original video, significantly boosting fidelity. We propose to improve motion editability by a new, mixed objective that jointly finetunes with full temporal attention and with temporal attention masking. We further introduce a new framework for image animation. We first transform the image into a coarse video by simple image processing operations such as replication and perspective geometric projections, and then use our general video editor to animate it. As a further application, we can use our method for subject-driven video generation. Extensive qualitative and numerical experiments showcase the remarkable editing ability of our method and establish its superior performance compared to baseline methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-02
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2772191898
source	Publicly Available Content Database
subjects	Accuracy Animation Diffusion Editing High resolution Image processing
title	Dreamix: Video Diffusion Models are General Video Editors
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T15%3A07%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Dreamix:%20Video%20Diffusion%20Models%20are%20General%20Video%20Editors&rft.jtitle=arXiv.org&rft.au=Molad,%20Eyal&rft.date=2023-02-02&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2772191898%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27721918983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2772191898&rft_id=info:pmid/&rfr_iscdi=true