Loading…

Cross-modal transfer with neural word vectors for image feature learning

Neural word vector (NWV) such as word2vec is a powerful text representation tool that can encode extensive semantic information into compact vectors. This ability poses an interesting question in relation to image processing research - Can we learn better semantic image features from NWVs? We empiri...

Full description

Saved in:
Bibliographic Details
Main Authors: Irie, Go, Asami, Taichi, Tarashima, Shuhei, Kurozumi, Takayuki, Kinebuchi, Tetsuya
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Neural word vector (NWV) such as word2vec is a powerful text representation tool that can encode extensive semantic information into compact vectors. This ability poses an interesting question in relation to image processing research - Can we learn better semantic image features from NWVs? We empirically explore this question in the context of semantic content-based image retrieval (CBIR). In this paper, we consider cross-modal transfer learning (CMT) to improve initial convolutional neural network (CNN) image features by using NWVs. We first show that NWVs can improve semantic CBIR performance compared to classical word vectors, even if it is with simple CMT models, i.e., canonical correlation analysis (CCA). Next, inspired by a characteristic property of NWVs, we propose a new CMT model and demonstrate that it can improve CBIR performance even further.
ISSN:2379-190X
DOI:10.1109/ICASSP.2017.7952690