Loading…

A Cross-modal Alignment for Zero-shot Image Classification

Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an enc...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2023-01, Vol.11, p.1-1
Main Authors:	Wu, Lu, Wu, Chenyu, Guo, Han, Zhao, Zhihao
Format:	Article
Language:	English
Subjects:	Alignment Annotations Coders Cosine Similarity Cross-modal Alignment Feature extraction Feature maps Image classification Matching Monitoring Object recognition Semantics Text Attribute Query Training data Visualization Zero-shot Image Classification
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an encoder module is used to align semantic matching between visual features and their corresponding text attribute parts. Then, an attention module is used to get response maps through the text attribute query integrated into feature maps. Finally, the cosine distance metric is used to measure the matching degree of the text attribute query and the corresponding feature response. The experiment results show that our method get better performance than existing ZSL in Embedding-based methods as well as other generative methods.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3237966