Loading…
A Cross-modal Alignment for Zero-shot Image Classification
Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an enc...
Saved in:
Published in: | IEEE access 2023-01, Vol.11, p.1-1 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an encoder module is used to align semantic matching between visual features and their corresponding text attribute parts. Then, an attention module is used to get response maps through the text attribute query integrated into feature maps. Finally, the cosine distance metric is used to measure the matching degree of the text attribute query and the corresponding feature response. The experiment results show that our method get better performance than existing ZSL in Embedding-based methods as well as other generative methods. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3237966 |