Loading…

Sarcasm Detection in Indonesian-English Code-Mixed Text Using Multihead Attention-Based Convolutional and Bi-Directional GRU

Detecting sarcasm in text is a very challenging task. Sarcasm often depends on context, tone, and cultural references, which can be difficult for machines to understand. In addition, the increasing occurrence of code-mixing in social media posts poses new challenges in sarcasm detection. Research on...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2024, Vol.12, p.137063-137079
Main Authors: Alfan Rosid, Mochamad, Oranova Siahaan, Daniel, Saikhu, Ahmad
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Detecting sarcasm in text is a very challenging task. Sarcasm often depends on context, tone, and cultural references, which can be difficult for machines to understand. In addition, the increasing occurrence of code-mixing in social media posts poses new challenges in sarcasm detection. Research on sarcasm detection in mixed-code text written in languages other than English is still limited owing to the unavailability of public datasets. To overcome this issue, a dataset was built for sarcasm detection in Indonesian-English mixed-code texts. Furthermore, a hybrid model based on a convolutional neural network (CNN) with multi-head attention and a bi-directional gated recurrent unit (BiGRU), named MHA-CovBi, is proposed for sarcasm detection. In the proposed MHA-CovBi model, a combination of FastText and GloVe word embeddings is utilized to assist the model in understanding and processing texts in different languages. GloVe pretrained word embedding is used for vector representation of English words, while FastText pretrained word embedding is used for vector representation of Indonesian words. Moreover, an auxiliary pragmatic feature illustrating the number of pragmatic markers in tweets was incorporated to enhance detection performance. In addition, this study presents a language detection scheme and transliteration process that can be used to handle languages other than Indonesian and English using Google Translate API. The performance of the proposed model was evaluated through comparative analysis against existing approaches. The proposed model successfully outperformed current state-of-the-art models, achieving an accuracy of 94.60% and F1 score of 94.38%.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3436107