Loading…

A novel sentiment aware dictionary for multi-domain sentiment classification

•The proposed sentiment aware dictionary, created using multiple domain data, is a solution to multi-domain sentiment classification in e-commerce domain.•Our dictionary is used to classify unlabeled reviews of the target domain.•Our classifier is implemented on Hindi language Product reviews. It ca...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & electrical engineering 2018-07, Vol.69, p.585-597
Main Authors:	Jha, Vandana, R, Savitha, Shenoy, P Deepa, K R, Venugopal, Sangaiah, Arun Kumar
Format:	Article
Language:	English
Subjects:	Accuracy Classification Data mining Domains Hindi language Hindi Sentiwordnet Labels Language Multi-domain Natural language processing Sentiment analysis Sentiment aware dictionary
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•The proposed sentiment aware dictionary, created using multiple domain data, is a solution to multi-domain sentiment classification in e-commerce domain.•Our dictionary is used to classify unlabeled reviews of the target domain.•Our classifier is implemented on Hindi language Product reviews. It can be easily extended to any reviews in e-commerce domain by using language specific parser and tagger.•Several experiments have been performed and the results obtained are able to label 23–24% more number of words of unlabeled target domain. Sentiment Analysis is a sub area of Natural Language Processing (NLP) which extracts user’s opinion and classifies it according to its polarity. This task has many applications but it is domain dependent and a costly task to annotate the corpora in every possible domain of interest before training the classifier. We are making an attempt to solve this problem by creating a sentiment aware dictionary using multiple domain data. This dictionary is created using labeled data from the source domain and unlabeled data from both source and target domains. Next, this dictionary is used to classify the unlabeled reviews of the target domain. The work is carried out in Hindi, the official language of India. The web pages in Hindi language is booming after the introduction of UTF-8 encoding style. When compared with labeling done by Hindi Sentiwordnet (HSWN), a general lexicon for word polarity, the proposed method is able to label 23–24% more number of words of target domain. The labels assigned by our method and the labels given by HSWN, for the available words, are compared and found matching with 76% accuracy.
ISSN:	0045-7906 1879-0755
DOI:	10.1016/j.compeleceng.2017.10.015