Loading…

Classification of tourism destination review texts based on sentiment using k-nearest neighbor with information gain feature selection

Sentiment analysis of tourism destination reviews can be used as feedback to managers to improve the quality of tourism services. Many methods have been used to classify the review text based on its sentiment. k-Nearest Neighbor (KNN) is a classification method that is widely used in sentiment analy...

Full description

Saved in:

Bibliographic Details
Main Authors:	Husni, Kifli, Abd, Syakur, Muhammad Ali, Muntasa, Arif, Rachman, Eka Mala Sari, Fauzan, Hermawan Bin, Rachmad, Aeri
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy Classification Computing time Data mining Feature selection Information management K-nearest neighbors algorithm Oversampling Sentiment analysis Texts Tourism
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Sentiment analysis of tourism destination reviews can be used as feedback to managers to improve the quality of tourism services. Many methods have been used to classify the review text based on its sentiment. k-Nearest Neighbor (KNN) is a classification method that is widely used in sentiment analysis. This simple approach is capable of providing very high accuracy. The main drawback of KNN is the long computing time, so by default it is not recommended for big data computing. This article explains how the KNN method is combined with Information Gain (IG) feature selection to select only the best terms (words) in the dataset to be involved in computing. This research analyzes the review text of a tourism destination on Madura Island which was downloaded from Google Map. This review was preprocessed using case-folding, cleansing, normalization, tokenization, stop-word removal, and stemming techniques. Each term is given a weight using TF-IDF, then feature restrictions are carried out using IG. Testing shows that the KNN classifier without involving IG provides the best accuracy of 98.4% (only oversampling), namely when the k value = 1, whereas when KNN is combined with IG the best accuracy is 97.6 (oversampling plus feature selection) when the k value is set to 3 and the threshold is 0.0008. The combination of KNN and IG is recommended to be applied to classify large-scale review texts based on sentiment. Reducing the number of features can shorten computing time while maintaining the accuracy of the classifier.
ISSN:	0094-243X 1551-7616
DOI:	10.1063/5.0222729