Loading…

Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the ClowdFlows platform

•We present a cloud based platform for data stream processing with workflows.•The ClowdFlows platform enables processing of multiple concurrent data streams.•We implement an active learning scenario for sentiment analysis on data streams.•Machine learning methods are shown to be suitable for sentime...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management 2015-03, Vol.51 (2), p.187-203
Main Authors: Kranjc, Janez, Smailović, Jasmina, Podpečan, Vid, Grčar, Miha, Žnidaršič, Martin, Lavrač, Nada
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We present a cloud based platform for data stream processing with workflows.•The ClowdFlows platform enables processing of multiple concurrent data streams.•We implement an active learning scenario for sentiment analysis on data streams.•Machine learning methods are shown to be suitable for sentiment analysis.•Active learning improves the accuracy of sentiment classification. Sentiment analysis from data streams is aimed at detecting authors’ attitude, emotions and opinions from texts in real-time. To reduce the labeling effort needed in the data collection phase, active learning is often applied in streaming scenarios, where a learning algorithm is allowed to select new examples to be manually labeled in order to improve the learner’s performance. Even though there are many on-line platforms which perform sentiment analysis, there is no publicly available interactive on-line platform for dynamic adaptive sentiment analysis, which would be able to handle changes in data streams and adapt its behavior over time. This paper describes ClowdFlows, a cloud-based scientific workflow platform, and its extensions enabling the analysis of data streams and active learning. Moreover, by utilizing the data and workflow sharing in ClowdFlows, the labeling of examples can be distributed through crowdsourcing. The advanced features of ClowdFlows are demonstrated on a sentiment analysis use case, using active learning with a linear Support Vector Machine for learning sentiment classification models to be applied to microblogging data streams.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2014.04.001