Loading…

MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval

•One of the first attempts of use GATs to solve cross-modal hashing issues.•Pay more attention to the most important information to enhance the robustness.•Design an label encoder to guide the feature extraction to narrow the modality gap.•Use multi-label annotations to bridge the semantic relevance...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition 2022-08, Vol.128, Article 108676
Main Authors:	Duan, Youxiang, Chen, Ning, Zhang, Peiying, Kumar, Neeraj, Chang, Lunjie, Wen, Wu
Format:	Article
Language:	English
Subjects:	Cross-modal retrieval Deep hashing Graph attention network
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•One of the first attempts of use GATs to solve cross-modal hashing issues.•Pay more attention to the most important information to enhance the robustness.•Design an label encoder to guide the feature extraction to narrow the modality gap.•Use multi-label annotations to bridge the semantic relevance at a fine-grained level.•MS2GAH can achieve satisfactory performance. Due to the strong nonlinear representation capabilities of deep neural networks and the low storage and high efficiency characteristics of hash learning, deep cross-modal hashing has been propelled to the forefront of academics. How to preferably bridge semantic relevance to further bridge the semantic modality gap is the vital bottleneck to improve model performance. Confronting samples with rich semantics, how to comprehensively explore the hidden correlations and establish more precise modality relationships is the primary issue to be solved. In this work, we propose a novel deep hashing method called Multi-Label Semantic Supervised Graph Attention Hashing (MS2GAH), which is an end-to-end framework that integrates graph attention networks (GATs). It constructs graph features through the adjacency of nodes and assigns different weights to adjacent edges to enhance the robustness of the model. Simultaneously, multi-label annotations are utilized to bridge the semantic relevance between modalities in a more fine-grained manner. To make preferable use of rich semantic information, an end-to-end label encoder is designed to mine high-level semantics from multi-label annotations to guide the feature extraction process of specific-modality networks, thereby further narrowing the modality gap. Finally, extensive experiments have been conducted on four datasets, and the results show that MS2GAH is superior to other baselines and one step forward.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2022.108676