Loading…

A Cross-modal Alignment for Zero-shot Image Classification

Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an enc...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2023-01, Vol.11, p.1-1
Main Authors: Wu, Lu, Wu, Chenyu, Guo, Han, Zhao, Zhihao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an encoder module is used to align semantic matching between visual features and their corresponding text attribute parts. Then, an attention module is used to get response maps through the text attribute query integrated into feature maps. Finally, the cosine distance metric is used to measure the matching degree of the text attribute query and the corresponding feature response. The experiment results show that our method get better performance than existing ZSL in Embedding-based methods as well as other generative methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3237966