Loading…
Arabic Lexicon Learning to Analyze Sentiment in Microblogs
The study and classifying of opinions distilled from social media is called sentiment analysis. The goal of this study is to build an adaptive sentiment lexicon for Arabic language. Based on those lexicons the sentiments polarity classification can be improved. The classification problem will be sta...
Saved in:
Published in: | International journal of advanced computer science & applications 2019, Vol.10 (8) |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The study and classifying of opinions distilled from social media is called sentiment analysis. The goal of this study is to build an adaptive sentiment lexicon for Arabic language. Based on those lexicons the sentiments polarity classification can be improved. The classification problem will be stated as a mathematical programming problem. In this problem, we search a lexicon that optimizes the classification accuracy. A genetic algorithm is presented to solve the optimization problem. A meta-level feature is generated based on the adaptive lexicons provided by the genetic algorithm. The algorithm performance is supported by using it alongside n-gram features and Bing liu’s lexicon. In this work, lexicon-based and corpora-based approaches are integrated, and the lexicons are produced from the corpus. Five data sets are tested through experiments. The sentiments in all data sets are classified based on five polarity levels. A better understanding of words sentiment orientation, social media users’ culture and Arabic language can be achieved based on the lexicons generated by the proposed algorithm. Since stop words can contribute and add to the sentiment polarity, stop words will be considered and will not deleted. The results show that the F-measure is greater than 80 % in three data sets and the accuracy is greater than 80 % for all data sets. The proposed method out-performs the current methods in the literature in two of the datasets. Finally, in terms of F-measure, the proposed methods achieved better results for three datasets. |
---|---|
ISSN: | 2158-107X 2156-5570 |
DOI: | 10.14569/IJACSA.2019.0100878 |