Loading…

Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change

Deception in computer-mediated communication represents a threat, and there is a growing need to develop efficient methods of detecting it. Machine learning models have, through natural language processing, proven to be extremely successful at detecting lexical patterns related to deception. In this...

Full description

Saved in:
Bibliographic Details
Published in:Algorithms 2023-04, Vol.16 (5), p.221
Main Authors: Brzic, Barbara, Boticki, Ivica, Bagic Babac, Marina
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deception in computer-mediated communication represents a threat, and there is a growing need to develop efficient methods of detecting it. Machine learning models have, through natural language processing, proven to be extremely successful at detecting lexical patterns related to deception. In this study, four selected machine learning models are trained and tested on data collected through a crowdsourcing platform on the topics of COVID-19 and climate change. The performance of the models was tested by analyzing n-grams (from unigrams to trigrams) and by using psycho-linguistic analysis. A selection of important features was carried out and further deepened with additional testing of the models on different subsets of the obtained features. This study concludes that the subjectivity of the collected data greatly affects the detection of hidden linguistic features of deception. The psycho-linguistic analysis alone and in combination with n-grams achieves better classification results than an n-gram analysis while testing the models on own data, but also while examining the possibility of generalization, especially on trigrams where the combined approach achieves a notably higher accuracy of up to 16%. The n-gram analysis proved to be a more robust method during the testing of the mutual applicability of the models while psycho-linguistic analysis remained most inflexible.
ISSN:1999-4893
1999-4893
DOI:10.3390/a16050221