Loading…
VICTOR: Visual incompatibility detection with transformers and fashion-specific contrastive pre-training
For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considere...
Saved in:
Published in: | Journal of visual communication and image representation 2023-02, Vol.90, p.103741, Article 103741 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items and utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train VICTOR. A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets while reducing the instance-wise floating operations by 88%, striking a balance between high performance and efficiency. We release our code at https://github.com/stevejpapad/Visual-InCompatibility-Transformer
•Addressing outfit compatibility as regression and mismatching item detection.•Creation of the Polyvore-MISFITs dataset.•Proposal of fashion-specific contrastive image-language pre-training.•Reduction of the required computational resources by 88%•Surpassing the state-of-the-art by 2.38% on the Polyvore-Disjoint dataset. |
---|---|
ISSN: | 1047-3203 1095-9076 |
DOI: | 10.1016/j.jvcir.2022.103741 |