Loading…

Novel semantic and statistic features-based author profiling approach

The Author Profiling (AP) task aims to predict certain demographic (e.g., age, gender) about authors from their documents. AP on social media networks is gaining increased research attention over the past decade. This challenge is of increasing importance in several applications related to security,...

Full description

Saved in:
Bibliographic Details
Published in:Journal of ambient intelligence and humanized computing 2023-09, Vol.14 (9), p.12807-12823
Main Authors: Ouni, Sarra, Fkih, Fethi, Omri, Mohamed Nazih
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Author Profiling (AP) task aims to predict certain demographic (e.g., age, gender) about authors from their documents. AP on social media networks is gaining increased research attention over the past decade. This challenge is of increasing importance in several applications related to security, marketing, psychology, etc. This article describes our solution for solving the author profiling problem as part of an annual series of digital text forensics computing events (PAN 2019). AP’s goal at PAN 2019 is to be able to distinguish between bots and humans on Twitter, to then identify the gender of human users. To achieve these goals, we have proposed two new models: (i) a first model that will be applied only to an English dataset using semantic and stylistic features. This model is topic-based for semantic feature extraction from tweets. These extracted stylistic and semantic features will be integrated into the convolutional neural network (CNN) model and (ii) the second is a classification model which will be applied to a Spanish corpus. It uses various statistical characteristics in order to feed a classifier based on random forests. The experimental study, which we conducted on various standard databases, shows the effectiveness of our proposed models in terms of accuracy, precision, recall, F1-score and G-mean. In addition, the analysis of the results of the comparative study between our models and other existing models shows the limits of these latest and confirms the performance of the solutions we have proposed.
ISSN:1868-5137
1868-5145
DOI:10.1007/s12652-022-04198-w