Loading…
M3D-VTON: A Monocular-to-3D Virtual Try-On Network
Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 13229 |
container_issue | |
container_start_page | 13219 |
container_title | |
container_volume | |
creator | Zhao, Fuwei Xie, Zhenyu Kampffmeyer, Michael Dong, Haoye Han, Songfang Zheng, Tianxiang Zhang, Tao Liang, Xiaodan |
description | Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an initial full-body depth map and accomplishes 2D clothes-person alignment through a novel two-stage warping procedure; 2) The Depth Refinement Module (DRM) that refines the initial body depth to produce more detailed pleat and face characteristics; 3) The Texture Fusion Module (TFM) that fuses the warped clothing with the non-target body part to refine the results. We also construct a high-quality synthesized Monocular-to-3D virtual try-on dataset, in which each person image is associated with a front and a back depth map. Extensive experiments demonstrate that the proposed M3D-VTON can manipulate and reconstruct the 3D human body wearing the given clothing with compelling details and is more efficient than other 3D approaches. 1 |
doi_str_mv | 10.1109/ICCV48922.2021.01299 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9710658</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9710658</ieee_id><sourcerecordid>9710658</sourcerecordid><originalsourceid>FETCH-LOGICAL-c227t-3ed6ad5e3dfbc49b12390a6f7a142eaed1cc9b25c59edff2e251d394ab52f4de3</originalsourceid><addsrcrecordid>eNotjstKw0AUQEdRsNZ-gYL5galz7zyS666kPgp9bGK2YTIzgWhNZJIi_XsLdXU2h8Nh7AHEHEDQ0yrPS5UR4hwFwlwAEl2wGaUZGKMVZoD6kk1QZoKnWqgrNgGtBdeK6IbdDsOnEJIwMxOGG7nkZbHbPieLZNN3vTvsbeRjz-UyKds4Huw-KeKR77pkG8bfPn7dsevG7ocw--eUfby-FPk7X-_eVvlizR1iOnIZvLFeB-mb2imqASUJa5rUgsJggwfnqEbtNAXfNBhQg5ekbK2xUT7IKXs8d11sh7Htqq6PtoLTelqhMkKejPuz0YYQqp_Yftt4rCgFYXQm_wC6Zk_B</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>M3D-VTON: A Monocular-to-3D Virtual Try-On Network</title><source>IEEE Xplore All Conference Series</source><creator>Zhao, Fuwei ; Xie, Zhenyu ; Kampffmeyer, Michael ; Dong, Haoye ; Han, Songfang ; Zheng, Tianxiang ; Zhang, Tao ; Liang, Xiaodan</creator><creatorcontrib>Zhao, Fuwei ; Xie, Zhenyu ; Kampffmeyer, Michael ; Dong, Haoye ; Han, Songfang ; Zheng, Tianxiang ; Zhang, Tao ; Liang, Xiaodan</creatorcontrib><description>Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an initial full-body depth map and accomplishes 2D clothes-person alignment through a novel two-stage warping procedure; 2) The Depth Refinement Module (DRM) that refines the initial body depth to produce more detailed pleat and face characteristics; 3) The Texture Fusion Module (TFM) that fuses the warped clothing with the non-target body part to refine the results. We also construct a high-quality synthesized Monocular-to-3D virtual try-on dataset, in which each person image is associated with a front and a back depth map. Extensive experiments demonstrate that the proposed M3D-VTON can manipulate and reconstruct the 3D human body wearing the given clothing with compelling details and is more efficient than other 3D approaches. 1</description><identifier>ISSN: 1550-5499</identifier><identifier>ISSN: 2380-7504</identifier><identifier>EISSN: 2380-7504</identifier><identifier>EISBN: 9781665428125</identifier><identifier>EISBN: 1665428120</identifier><identifier>DOI: 10.1109/ICCV48922.2021.01299</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>3D from a single image and shape-from-x ; Computer vision ; Datasets and evaluation ; Electronic commerce ; Estimation ; Fuses ; Image and video manipulation detection and integrity methods ; Image and video synthesis ; Shape ; Three-dimensional displays</subject><ispartof>2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, p.13219-13229</ispartof><rights>info:eu-repo/semantics/openAccess</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9710658$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,314,780,784,789,790,885,26567,27924,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9710658$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhao, Fuwei</creatorcontrib><creatorcontrib>Xie, Zhenyu</creatorcontrib><creatorcontrib>Kampffmeyer, Michael</creatorcontrib><creatorcontrib>Dong, Haoye</creatorcontrib><creatorcontrib>Han, Songfang</creatorcontrib><creatorcontrib>Zheng, Tianxiang</creatorcontrib><creatorcontrib>Zhang, Tao</creatorcontrib><creatorcontrib>Liang, Xiaodan</creatorcontrib><title>M3D-VTON: A Monocular-to-3D Virtual Try-On Network</title><title>2021 IEEE/CVF International Conference on Computer Vision (ICCV)</title><addtitle>ICCV</addtitle><description>Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an initial full-body depth map and accomplishes 2D clothes-person alignment through a novel two-stage warping procedure; 2) The Depth Refinement Module (DRM) that refines the initial body depth to produce more detailed pleat and face characteristics; 3) The Texture Fusion Module (TFM) that fuses the warped clothing with the non-target body part to refine the results. We also construct a high-quality synthesized Monocular-to-3D virtual try-on dataset, in which each person image is associated with a front and a back depth map. Extensive experiments demonstrate that the proposed M3D-VTON can manipulate and reconstruct the 3D human body wearing the given clothing with compelling details and is more efficient than other 3D approaches. 1</description><subject>3D from a single image and shape-from-x</subject><subject>Computer vision</subject><subject>Datasets and evaluation</subject><subject>Electronic commerce</subject><subject>Estimation</subject><subject>Fuses</subject><subject>Image and video manipulation detection and integrity methods</subject><subject>Image and video synthesis</subject><subject>Shape</subject><subject>Three-dimensional displays</subject><issn>1550-5499</issn><issn>2380-7504</issn><issn>2380-7504</issn><isbn>9781665428125</isbn><isbn>1665428120</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>3HK</sourceid><recordid>eNotjstKw0AUQEdRsNZ-gYL5galz7zyS666kPgp9bGK2YTIzgWhNZJIi_XsLdXU2h8Nh7AHEHEDQ0yrPS5UR4hwFwlwAEl2wGaUZGKMVZoD6kk1QZoKnWqgrNgGtBdeK6IbdDsOnEJIwMxOGG7nkZbHbPieLZNN3vTvsbeRjz-UyKds4Huw-KeKR77pkG8bfPn7dsevG7ocw--eUfby-FPk7X-_eVvlizR1iOnIZvLFeB-mb2imqASUJa5rUgsJggwfnqEbtNAXfNBhQg5ekbK2xUT7IKXs8d11sh7Htqq6PtoLTelqhMkKejPuz0YYQqp_Yftt4rCgFYXQm_wC6Zk_B</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Zhao, Fuwei</creator><creator>Xie, Zhenyu</creator><creator>Kampffmeyer, Michael</creator><creator>Dong, Haoye</creator><creator>Han, Songfang</creator><creator>Zheng, Tianxiang</creator><creator>Zhang, Tao</creator><creator>Liang, Xiaodan</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>3HK</scope></search><sort><creationdate>20210101</creationdate><title>M3D-VTON: A Monocular-to-3D Virtual Try-On Network</title><author>Zhao, Fuwei ; Xie, Zhenyu ; Kampffmeyer, Michael ; Dong, Haoye ; Han, Songfang ; Zheng, Tianxiang ; Zhang, Tao ; Liang, Xiaodan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c227t-3ed6ad5e3dfbc49b12390a6f7a142eaed1cc9b25c59edff2e251d394ab52f4de3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><topic>3D from a single image and shape-from-x</topic><topic>Computer vision</topic><topic>Datasets and evaluation</topic><topic>Electronic commerce</topic><topic>Estimation</topic><topic>Fuses</topic><topic>Image and video manipulation detection and integrity methods</topic><topic>Image and video synthesis</topic><topic>Shape</topic><topic>Three-dimensional displays</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Fuwei</creatorcontrib><creatorcontrib>Xie, Zhenyu</creatorcontrib><creatorcontrib>Kampffmeyer, Michael</creatorcontrib><creatorcontrib>Dong, Haoye</creatorcontrib><creatorcontrib>Han, Songfang</creatorcontrib><creatorcontrib>Zheng, Tianxiang</creatorcontrib><creatorcontrib>Zhang, Tao</creatorcontrib><creatorcontrib>Liang, Xiaodan</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>NORA - Norwegian Open Research Archives</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Fuwei</au><au>Xie, Zhenyu</au><au>Kampffmeyer, Michael</au><au>Dong, Haoye</au><au>Han, Songfang</au><au>Zheng, Tianxiang</au><au>Zhang, Tao</au><au>Liang, Xiaodan</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>M3D-VTON: A Monocular-to-3D Virtual Try-On Network</atitle><btitle>2021 IEEE/CVF International Conference on Computer Vision (ICCV)</btitle><stitle>ICCV</stitle><date>2021-01-01</date><risdate>2021</risdate><spage>13219</spage><epage>13229</epage><pages>13219-13229</pages><issn>1550-5499</issn><issn>2380-7504</issn><eissn>2380-7504</eissn><eisbn>9781665428125</eisbn><eisbn>1665428120</eisbn><coden>IEEPAD</coden><abstract>Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an initial full-body depth map and accomplishes 2D clothes-person alignment through a novel two-stage warping procedure; 2) The Depth Refinement Module (DRM) that refines the initial body depth to produce more detailed pleat and face characteristics; 3) The Texture Fusion Module (TFM) that fuses the warped clothing with the non-target body part to refine the results. We also construct a high-quality synthesized Monocular-to-3D virtual try-on dataset, in which each person image is associated with a front and a back depth map. Extensive experiments demonstrate that the proposed M3D-VTON can manipulate and reconstruct the 3D human body wearing the given clothing with compelling details and is more efficient than other 3D approaches. 1</abstract><pub>IEEE</pub><doi>10.1109/ICCV48922.2021.01299</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1550-5499 |
ispartof | 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, p.13219-13229 |
issn | 1550-5499 2380-7504 2380-7504 |
language | eng |
recordid | cdi_ieee_primary_9710658 |
source | IEEE Xplore All Conference Series |
subjects | 3D from a single image and shape-from-x Computer vision Datasets and evaluation Electronic commerce Estimation Fuses Image and video manipulation detection and integrity methods Image and video synthesis Shape Three-dimensional displays |
title | M3D-VTON: A Monocular-to-3D Virtual Try-On Network |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T05%3A53%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=M3D-VTON:%20A%20Monocular-to-3D%20Virtual%20Try-On%20Network&rft.btitle=2021%20IEEE/CVF%20International%20Conference%20on%20Computer%20Vision%20(ICCV)&rft.au=Zhao,%20Fuwei&rft.date=2021-01-01&rft.spage=13219&rft.epage=13229&rft.pages=13219-13229&rft.issn=1550-5499&rft.eissn=2380-7504&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICCV48922.2021.01299&rft.eisbn=9781665428125&rft.eisbn_list=1665428120&rft_dat=%3Cieee_CHZPO%3E9710658%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c227t-3ed6ad5e3dfbc49b12390a6f7a142eaed1cc9b25c59edff2e251d394ab52f4de3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9710658&rfr_iscdi=true |