Loading…

Image captioning by incorporating affective concepts learned from both visual and textual components

Automatically generating a natural sentence describing the content of an image has been extensively researched in artificial intelligence recently, and it bridges the gap between computer vision and natural language processing communities. Most of existing captioning frameworks rely heavily on the v...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2019-02, Vol.328, p.56-68
Main Authors: Yang, Jufeng, Sun, Yan, Liang, Jie, Ren, Bo, Lai, Shang-Hong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatically generating a natural sentence describing the content of an image has been extensively researched in artificial intelligence recently, and it bridges the gap between computer vision and natural language processing communities. Most of existing captioning frameworks rely heavily on the visual content, while rarely being aware of the sentimental information. In this paper, we introduce the affective concepts to enhance the emotion expressibility of text descriptions. We achieve this goal by composing appropriate emotional concepts to sentences, which is calculated from large-scale visual and textual repositories by learning both content and linguistic modules. We extract visual and textual representations respectively, followed by combining the latent codes of the two components into a low-dimensional subspace. After that, we decode the combined latent representations and finally generate the affective image captions. We evaluate our method on the SentiCap dataset, which was established with sentimental adjective noun pairs, and evaluate the emotional descriptions with several qualitative and human inception metrics. The experimental results demonstrate the capability of our method for analyzing the latent emotion of an image and providing the affective description which caters to human cognition.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2018.03.078