Loading…

Dual-level adaptive incongruity-enhanced model for multimodal sarcasm detection

Multimodal sarcasm detection leverages multimodal information, such as image, text, etc. to identify special instances whose superficial emotional expression is contrary to the actual emotion. Existing methods primarily focused on the incongruity between text and image information for sarcasm detect...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2025-01, Vol.612, p.128689, Article 128689
Main Authors: Wu, Qiaofeng, Fang, Wenlong, Zhong, Weiyu, Li, Fenghuan, Xue, Yun, Chen, Bo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multimodal sarcasm detection leverages multimodal information, such as image, text, etc. to identify special instances whose superficial emotional expression is contrary to the actual emotion. Existing methods primarily focused on the incongruity between text and image information for sarcasm detection. Existing sarcasm methods in which the tendency of image encoders to encode similar images into similar vectors, and the introduction of noise in graph-level feature extraction due to negative correlations caused by the accumulation of GAT layers and the lack of representations for non-neighboring nodes. To address these limitations, we propose a Dual-Level Adaptive Incongruity-Enhanced Model (DAIE) to extract the incongruity between the text and image at both token and graph levels. At the token level, we bolster token-level contrastive learning with patch-based reconstructed image to capture common and specific features of images, thereby amplifying incongruities between text and images. At the graph level, we introduce adaptive graph contrast learning, coupled with negative pair similarity weights, to refine the feature representation of the model’s textual and visual graph nodes, while also enhancing the information exchange among neighboring nodes. We conduct experiments using a publicly available sarcasm detection dataset. The results demonstrate the effectiveness of our method, outperforming several state-of-the-art approaches by 3.33% and 4.34% on accuracy and F1 score, respectively. •A Dual-Level Adaptive Incongruity-Enhanced Model(DAIE) is proposed.•By leveraging Patch-based Reconstructed Image(PRI), the token-level contrastive learning(TLCL) effectively diminishes the presence of common features among visually similar images.•The graph-level contrastive learning(GLCL) module with Negative pair Similarity Weights(NSW) dynamically adjusts the inter-node weights across the Graph Attention Networks(GAT).•Experimental results on a publicly available multimodal sarcasm detection dataset demonstrate the superiority of our proposed method.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.128689