Loading…

Image caption model of double LSTM with scene factors

In this paper, an image semantic understanding model combining scene factors is proposed to solve the problem that the accuracy rate of the description sentence is low in the current image semantic understanding model which is incorrect or ignores the scene recognition. This model first identifies t...

Full description

Saved in:
Bibliographic Details
Published in:Image and vision computing 2019-06, Vol.86, p.38-44
Main Authors: Peng, Yuqing, Liu, Xuan, Wang, Weihua, Zhao, Xiaosong, Wei, Ming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, an image semantic understanding model combining scene factors is proposed to solve the problem that the accuracy rate of the description sentence is low in the current image semantic understanding model which is incorrect or ignores the scene recognition. This model first identifies the corresponding theme (scene information) through the text volume of the LDA analysis corpus. We get the vocabulary used in this scene. Then we use the ResNet to extract the global feature of the image, and use the Places365-CNNs to extract the feature of the deep scene. Finally, the model uses the picture scene information and the corpus scene information. In the description statement of the picture generation, it uses the words related to the picture scene in large probability and in the statement. In the process of generation, double LSTM is used to adjust the parameters to improve the accuracy of statement generation. This model is trained and tested in the Flickr8K, Flickr30K and MSCOCO image sets. The model is verified with different evaluation methods. The experimental results show that the proposed model can effectively improve the image language compared with other models. The accuracy of meaning understanding can solve these problems effectively.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2019.03.003