Loading…

ViMo: Generating Motions from Casual Videos

Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-08
Main Authors:	Qiu, Liangdong, Yu, Chengxing, Li, Yanran, Wang, Zhao, Huang, Haibin, Ma, Chongyang, Zhang, Di, Wan, Pengfei, Han, Xiaoguang
Format:	Article
Language:	English
Subjects:	Cameras Motion capture Video
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Qiu, Liangdong Yu, Chengxing Li, Yanran Wang, Zhao Huang, Haibin Ma, Chongyang Zhang, Di Wan, Pengfei Han, Xiaoguang
description	Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting in a limited size that severely undermines their generalizability. Inspired by recent advance of diffusion models, we probe a simple and effective way to capture motions from videos and propose a novel Video-to-Motion-Generation framework (ViMo) which could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Distinct from prior work, our videos could be more causal, including complicated camera movements and occlusions. Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist. We also show this work could enable three important downstream applications, such as generating dancing motions according to arbitrary music and source video style. Extensive experimental results prove that our model offers an effective and scalable way to generate diversity and realistic motions. Code and demos will be public soon.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3092956515</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3092956515</sourcerecordid><originalsourceid>FETCH-proquest_journals_30929565153</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDsv0zbdScE_NSy1KLMnMS1fwzS_JzM8rVkgrys9VcE4sLk3MUQjLTEnNL-ZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXhjA0sjS1MzU0NTY-JUAQAQZjDD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3092956515</pqid></control><display><type>article</type><title>ViMo: Generating Motions from Casual Videos</title><source>Publicly Available Content (ProQuest)</source><creator>Qiu, Liangdong ; Yu, Chengxing ; Li, Yanran ; Wang, Zhao ; Huang, Haibin ; Ma, Chongyang ; Zhang, Di ; Wan, Pengfei ; Han, Xiaoguang</creator><creatorcontrib>Qiu, Liangdong ; Yu, Chengxing ; Li, Yanran ; Wang, Zhao ; Huang, Haibin ; Ma, Chongyang ; Zhang, Di ; Wan, Pengfei ; Han, Xiaoguang</creatorcontrib><description>Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting in a limited size that severely undermines their generalizability. Inspired by recent advance of diffusion models, we probe a simple and effective way to capture motions from videos and propose a novel Video-to-Motion-Generation framework (ViMo) which could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Distinct from prior work, our videos could be more causal, including complicated camera movements and occlusions. Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist. We also show this work could enable three important downstream applications, such as generating dancing motions according to arbitrary music and source video style. Extensive experimental results prove that our model offers an effective and scalable way to generate diversity and realistic motions. Code and demos will be public soon.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cameras ; Motion capture ; Video</subject><ispartof>arXiv.org, 2024-08</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3092956515?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Qiu, Liangdong</creatorcontrib><creatorcontrib>Yu, Chengxing</creatorcontrib><creatorcontrib>Li, Yanran</creatorcontrib><creatorcontrib>Wang, Zhao</creatorcontrib><creatorcontrib>Huang, Haibin</creatorcontrib><creatorcontrib>Ma, Chongyang</creatorcontrib><creatorcontrib>Zhang, Di</creatorcontrib><creatorcontrib>Wan, Pengfei</creatorcontrib><creatorcontrib>Han, Xiaoguang</creatorcontrib><title>ViMo: Generating Motions from Casual Videos</title><title>arXiv.org</title><description>Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting in a limited size that severely undermines their generalizability. Inspired by recent advance of diffusion models, we probe a simple and effective way to capture motions from videos and propose a novel Video-to-Motion-Generation framework (ViMo) which could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Distinct from prior work, our videos could be more causal, including complicated camera movements and occlusions. Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist. We also show this work could enable three important downstream applications, such as generating dancing motions according to arbitrary music and source video style. Extensive experimental results prove that our model offers an effective and scalable way to generate diversity and realistic motions. Code and demos will be public soon.</description><subject>Cameras</subject><subject>Motion capture</subject><subject>Video</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDsv0zbdScE_NSy1KLMnMS1fwzS_JzM8rVkgrys9VcE4sLk3MUQjLTEnNL-ZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXhjA0sjS1MzU0NTY-JUAQAQZjDD</recordid><startdate>20240813</startdate><enddate>20240813</enddate><creator>Qiu, Liangdong</creator><creator>Yu, Chengxing</creator><creator>Li, Yanran</creator><creator>Wang, Zhao</creator><creator>Huang, Haibin</creator><creator>Ma, Chongyang</creator><creator>Zhang, Di</creator><creator>Wan, Pengfei</creator><creator>Han, Xiaoguang</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240813</creationdate><title>ViMo: Generating Motions from Casual Videos</title><author>Qiu, Liangdong ; Yu, Chengxing ; Li, Yanran ; Wang, Zhao ; Huang, Haibin ; Ma, Chongyang ; Zhang, Di ; Wan, Pengfei ; Han, Xiaoguang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30929565153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Cameras</topic><topic>Motion capture</topic><topic>Video</topic><toplevel>online_resources</toplevel><creatorcontrib>Qiu, Liangdong</creatorcontrib><creatorcontrib>Yu, Chengxing</creatorcontrib><creatorcontrib>Li, Yanran</creatorcontrib><creatorcontrib>Wang, Zhao</creatorcontrib><creatorcontrib>Huang, Haibin</creatorcontrib><creatorcontrib>Ma, Chongyang</creatorcontrib><creatorcontrib>Zhang, Di</creatorcontrib><creatorcontrib>Wan, Pengfei</creatorcontrib><creatorcontrib>Han, Xiaoguang</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qiu, Liangdong</au><au>Yu, Chengxing</au><au>Li, Yanran</au><au>Wang, Zhao</au><au>Huang, Haibin</au><au>Ma, Chongyang</au><au>Zhang, Di</au><au>Wan, Pengfei</au><au>Han, Xiaoguang</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>ViMo: Generating Motions from Casual Videos</atitle><jtitle>arXiv.org</jtitle><date>2024-08-13</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting in a limited size that severely undermines their generalizability. Inspired by recent advance of diffusion models, we probe a simple and effective way to capture motions from videos and propose a novel Video-to-Motion-Generation framework (ViMo) which could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Distinct from prior work, our videos could be more causal, including complicated camera movements and occlusions. Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist. We also show this work could enable three important downstream applications, such as generating dancing motions according to arbitrary music and source video style. Extensive experimental results prove that our model offers an effective and scalable way to generate diversity and realistic motions. Code and demos will be public soon.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3092956515
source	Publicly Available Content (ProQuest)
subjects	Cameras Motion capture Video
title	ViMo: Generating Motions from Casual Videos
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T14%3A24%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=ViMo:%20Generating%20Motions%20from%20Casual%20Videos&rft.jtitle=arXiv.org&rft.au=Qiu,%20Liangdong&rft.date=2024-08-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3092956515%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30929565153%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3092956515&rft_id=info:pmid/&rfr_iscdi=true