Loading…

Data Augmentation for Enhancing Detection of Misogynistic Content in Social Media by Transferring Knowledge from Song Phrases

Misogyny is a severe social problem that affects women's mental and physical health or even leads to femicide. This cultural problem is visible and prevalent in different communication channels, such as music and social media, confirming or inciting this behavior. Hence, the automatic detection...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2023-02, p.1-1
Main Authors: Calderon-Suarez, Ricardo, Ortega-Mendoza, Rosa M., Montes-y-Gomez, Manuel, Toxqui-Quitl, Carina, Marquez-Vera, Marco A.
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Misogyny is a severe social problem that affects women's mental and physical health or even leads to femicide. This cultural problem is visible and prevalent in different communication channels, such as music and social media, confirming or inciting this behavior. Hence, the automatic detection of misogynistic content in social media using computational methods that analyze the language is of increasing interest. Most approaches follow a supervised machine learning strategy, with the main challenge of capturing the diversity and complexity of the offensive language directed at women. Therefore, the size and quality of the training data play essential roles. In this context, we designed a novel data augmentation approach that leverages song phrases to increase the models' ability to generalize and improve their performance. In addition, this paper introduces a methodology to compile a labeled dataset with song segments conveying misogyny, which can be used to enrich different techniques in this field. The proposed approach was evaluated using English and Spanish benchmark datasets. It successfully overcomes conventional transfer learning techniques and achieves high competitiveness compared with state-of-the-art methods, outperforming them on the Spanish dataset. These encouraging results demonstrate the usefulness of the proposed approach.
ISSN:2169-3536
DOI:10.1109/ACCESS.2023.3242965