Loading…

Transformer-Based Discriminative and Strong Representation Deep Hashing for Cross-Modal Retrieval

Cross-modal hashing retrieval has attracted extensive attention due to its low storage requirements as well as high retrieval efficiency. In particular, how to more fully exploit the correlation of different modality data and generate a more distinguished representation is the key to improving the p...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2023, Vol.11, p.140041-140055
Main Authors: Zhou, Suqing, Han, Yu, Chen, Ning, Huang, Siyu, Igorevich, Kostromitin Konstantin, Luo, Jia, Zhang, Peiying
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cross-modal hashing retrieval has attracted extensive attention due to its low storage requirements as well as high retrieval efficiency. In particular, how to more fully exploit the correlation of different modality data and generate a more distinguished representation is the key to improving the performance of this method. Moreover, Transformer-based models have been widely used in various fields, including natural language processing, due to their powerful contextual information processing capabilities. Based on these motivations, we propose a Transformer-based Distinguishing Strong Representation Deep Hashing (TDSRDH). For text modality, since the sequential relations between words imply semantic relations that are not independent relations, we thoughtfully encode them using a transformer-based encoder to obtain a strong representation. In addition, we propose a triple-supervised loss based on the commonly used pairwise loss and quantization loss. The latter two ensure the learned features and hash-codes can preserve the similarity of the original data during the learning process. The former ensures that the distance between similar instances is closer and the distance between dissimilar instances is farther. So that TDSRDH can generate more discriminative representations while preserving the similarity between modalities. Finally, experiments on the three datasets MIRFLICKR-25K, IAPR TC-12, and NUS-WIDE demonstrated the superiority of TDSRDH over the other baselines. Moreover, the effectiveness of the proposed idea was demonstrated by ablation experiments.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3339581