Loading…
Novel semantic and statistic features-based author profiling approach
The Author Profiling (AP) task aims to predict certain demographic (e.g., age, gender) about authors from their documents. AP on social media networks is gaining increased research attention over the past decade. This challenge is of increasing importance in several applications related to security,...
Saved in:
Published in: | Journal of ambient intelligence and humanized computing 2023-09, Vol.14 (9), p.12807-12823 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The Author Profiling (AP) task aims to predict certain demographic (e.g., age, gender) about authors from their documents. AP on social media networks is gaining increased research attention over the past decade. This challenge is of increasing importance in several applications related to security, marketing, psychology, etc. This article describes our solution for solving the author profiling problem as part of an annual series of digital text forensics computing events (PAN 2019). AP’s goal at PAN 2019 is to be able to distinguish between bots and humans on Twitter, to then identify the gender of human users. To achieve these goals, we have proposed two new models: (i) a first model that will be applied only to an English dataset using semantic and stylistic features. This model is topic-based for semantic feature extraction from tweets. These extracted stylistic and semantic features will be integrated into the convolutional neural network (CNN) model and (ii) the second is a classification model which will be applied to a Spanish corpus. It uses various statistical characteristics in order to feed a classifier based on random forests. The experimental study, which we conducted on various standard databases, shows the effectiveness of our proposed models in terms of accuracy, precision, recall, F1-score and G-mean. In addition, the analysis of the results of the comparative study between our models and other existing models shows the limits of these latest and confirms the performance of the solutions we have proposed. |
---|---|
ISSN: | 1868-5137 1868-5145 |
DOI: | 10.1007/s12652-022-04198-w |