Loading…

CCFNet: Cross-Complementary fusion network for RGB-D scene parsing of clothing images

Schemes to complement context relationships by cross-scale feature fusion have appeared in many RGB-D scene parsing algorithms; however, most of these works conduct multi-scale information interaction after multi-modal feature fusion, which ignores the information loss of the two modes in the origin...

Full description

Saved in:
Bibliographic Details
Published in:Journal of visual communication and image representation 2023-02, Vol.90, p.103727, Article 103727
Main Authors: Xu, Gao, Zhou, Wujie, Qian, Xiaohong, Ye, Lv, Lei, Jingsheng, Yu, Lu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Schemes to complement context relationships by cross-scale feature fusion have appeared in many RGB-D scene parsing algorithms; however, most of these works conduct multi-scale information interaction after multi-modal feature fusion, which ignores the information loss of the two modes in the original coding. Therefore, a cross-complementary fusion network (CCFNet) is designed in this paper to calibrate the multi-modal information before feature fusion, so as to improve the feature quality of each mode and the information complementarity ability of RGB and the depth map. First, we divided the features into low, middle, and high levels, among which the low-level features contain the global details of the image and the main learning features include texture, edge, and other features. The middle layer features contain not only some global detail features but also some local semantic features. Additionally, the high-level features contain rich local semantic features. Then, the feature information lost in the coding process of low and middle level features is supplemented and extracted through the designed cross feature enhancement module, and the high-level features are extracted through the feature enhancement module. In addition, the cross-modal fusion module is designed to integrate multi-modal features of different levels. The experimental results verify that the proposed CCFNet achieves excellent performance on the RGB-D scene parsing dataset containing clothing images, and the generalization ability of the model is verified by the dataset NYU Depth V2.
ISSN:1047-3203
1095-9076
DOI:10.1016/j.jvcir.2022.103727