Loading…
A classified feature representation three-way decision model for sentiment analysis
Binary sentiment analysis uses sentiment dictionaries, TF-IDF, word2vec, and BERT to convert text documents such as product and movie reviews into vectors. Dimensionality reduction by feature selection can effectively reduce the complexity of sentiment analysis. Existing feature selection methods pu...
Saved in:
Published in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-05, Vol.52 (7), p.7995-8007 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Binary sentiment analysis uses sentiment dictionaries, TF-IDF, word2vec, and BERT to convert text documents such as product and movie reviews into vectors. Dimensionality reduction by feature selection can effectively reduce the complexity of sentiment analysis. Existing feature selection methods put all samples together and ignore the difference in the feature representation between different categories. For binary sentiment analysis, there are some reviews with uncertain sentiment polarity, three-way decision divides samples into positive (POS) region, negative (NEG) region, and uncertain region (UNC). The model based on the three-way decision is beneficial to process the UNC and improve the effect of binary sentiment analysis. However, how to obtain the optimal feature representation in certain regions respectively to process the uncertain samples is a challenge. In this paper, a classified feature representation three-way decision model is proposed to obtain the optimal feature representation of the positive and negative domains for sentiment analysis. In the positive domain and the negative domain, m- and n-layer feature representations are obtained. The optimal layer with the best performance is selected as the optimal feature representation. The POS region and the NEG region in the testing set are processed by the optimal feature representation, the UNC region is processed by the original feature representation. Experiments on IMDB and Amazon show that the performance of our proposed method in terms of classification accuracy in sentiment analysis is significantly higher than that of the chi-square, principal component analysis, and mutual information methods. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-021-02809-1 |