Loading…

Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change

Deception in computer-mediated communication represents a threat, and there is a growing need to develop efficient methods of detecting it. Machine learning models have, through natural language processing, proven to be extremely successful at detecting lexical patterns related to deception. In this...

Full description

Saved in:

Bibliographic Details
Published in:	Algorithms 2023-04, Vol.16 (5), p.221
Main Authors:	Brzic, Barbara, Boticki, Ivica, Bagic Babac, Marina
Format:	Article
Language:	English
Subjects:	Accuracy Climate change Communication Computational linguistics Computer mediated communication Crowdsourcing Data collection Datasets Deception deception detection Global temperature changes Language processing Linguistics Lying Machine learning N-Gram language models Natural language interfaces Natural language processing Psycholinguistics Semantics Subjectivity Telematics
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deception in computer-mediated communication represents a threat, and there is a growing need to develop efficient methods of detecting it. Machine learning models have, through natural language processing, proven to be extremely successful at detecting lexical patterns related to deception. In this study, four selected machine learning models are trained and tested on data collected through a crowdsourcing platform on the topics of COVID-19 and climate change. The performance of the models was tested by analyzing n-grams (from unigrams to trigrams) and by using psycho-linguistic analysis. A selection of important features was carried out and further deepened with additional testing of the models on different subsets of the obtained features. This study concludes that the subjectivity of the collected data greatly affects the detection of hidden linguistic features of deception. The psycho-linguistic analysis alone and in combination with n-grams achieves better classification results than an n-gram analysis while testing the models on own data, but also while examining the possibility of generalization, especially on trigrams where the combined approach achieves a notably higher accuracy of up to 16%. The n-gram analysis proved to be a more robust method during the testing of the mutual applicability of the models while psycho-linguistic analysis remained most inflexible.
ISSN:	1999-4893 1999-4893
DOI:	10.3390/a16050221