Loading…

Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network

Due to advances in technology, social media has become the most popular medium for spreading news. Many messages are published on social media sites such as Facebook, Twitter, Instagram, etc. Social media platforms also provide opportunities to express opinions and social phenomena such as hate, off...

Full description

Saved in:
Bibliographic Details
Published in:Egyptian informatics journal 2023-07, Vol.24 (2), p.365-373
Main Authors: Amer Hamzah, Noor, Dhannoon, Ban N.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to advances in technology, social media has become the most popular medium for spreading news. Many messages are published on social media sites such as Facebook, Twitter, Instagram, etc. Social media platforms also provide opportunities to express opinions and social phenomena such as hate, offensive language, racism, sexual content, and all forms of verbal violence, which have amazingly increased. These behaviors do not only affect specific countries, groups, or societies but extend beyond these areas into people's daily lives. This study examines sexual content and harassment discourse in Arabic social media to build an accurate system for detecting sexual harassment expressions. The dataset was collected from Twitter posts to make the classification. A deep learning model was developed as a classification system to identify sexual speech using Bidirectional Long-Short-Term Memory (BiLSTM), Temporal Convolutional Network (TCN) with word embedding and the FastText previously trained on the Arabic language model. The proposed (TCN-BiLSTM) model was compared with Extreme Gradient Boosting (XGBoost). The CASH dataset implemented with the (TCN -Bi-LSTM) model gate obtained an accuracy rate of 96.65% and an F0.5 value of 0.969. The implementation of XGBoost using word embeddings resulted in an accuracy rate of 92.56% and an F0.5 value of 0.925. Findings and manual interpretation showed that different text representation methods with various deep learning algorithms obtain higher classification performance easily in complex sentences. This strategy is helpful with languages that are difficult to study morphologically, like Arabic, Turkish, and Lithuanian.
ISSN:1110-8665
2090-4754
DOI:10.1016/j.eij.2023.05.007