Loading…
Spatial-aware Texture Transformer for High-fidelity Garment Transfer
Garment transfer aims to transfer the desired garment from a model image with the desired clothing to a target person, which has attracted a great deal of attention due to its wider potential applications. However, considering the model and target persons are often given at different views, body sha...
Saved in:
Published in: | IEEE transactions on image processing 2021-01, Vol.30, p.1-1 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Garment transfer aims to transfer the desired garment from a model image with the desired clothing to a target person, which has attracted a great deal of attention due to its wider potential applications. However, considering the model and target persons are often given at different views, body shapes and poses, realistic garment transfer is facing the following challenges that have not been well addressed: 1) deforming the garment; 2) inferring unobserved appearance; 3) preserving fine texture details. To tackle these challenges, we propose a novel SPatial-Aware Texture Transformer (SPATT) model. Different from existing models, SPATT establishes correspondence and infers unobserved clothing appearance by leveraging the spatial prior information of a UV-space. Specifically, the source image is transformed into a partial UV texture map guided by the extracted dense pose. To better infer the unseen appearance utilizing seen region, we first propose a novel coordinate-prior map that defines the spatial relationship between the coordinates in the UV texture map, and design an algorithm to compute it. Based on the proposed coordinate-prior map, we present a novel spatial-aware texture generation network to complete the partial UV texture. In the second stage, we first transform the completed UV texture to fit the target person. To polish the details and improve realism, we introduce a refinement generative network conditioned on the warped image and source input. Compared with existing frameworks as shown experimentally, the proposed framework can generate more realistic images with better-preserved texture details. Furthermore, difficult cases where two persons have large pose and view differences can also be well handled by SPATT. |
---|---|
ISSN: | 1057-7149 1941-0042 |
DOI: | 10.1109/TIP.2021.3107235 |