Loading…

CDS: Collaborative distant supervision for Twitter account classification

•Novel distant supervision-based approach to Twitter account classification.•Collaborative learning of distant supervision, active and semi-supervised learning.•Heuristics for automatically labelling Twitter accounts.•Generic strategy identifying false positives and false negatives from automatic la...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2017-10, Vol.83, p.94-103
Main Authors: Cui, Lishan, Zhang, Xiuzhen, Qin, A.K., Sellis, Timos, Wu, Lifang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Novel distant supervision-based approach to Twitter account classification.•Collaborative learning of distant supervision, active and semi-supervised learning.•Heuristics for automatically labelling Twitter accounts.•Generic strategy identifying false positives and false negatives from automatic labelling. Individuals use Twitter for personal communication, whereas businesses, politicians and celebrities use Twitter for branding purposes. Distinguishing Personal from Branding Twitter accounts is important for Twitter analytics. Existing studies of Twitter account classification apply classical supervised learning, which requires intensive manual annotation for training. In this paper, we propose CDS (Collaborative Distant Supervision), a novel learning scheme for Twitter account classification that does not require intensive manual labelling. Twitter accounts are automatically labelled using heuristics for distant supervision learning. To achieve effective learning from heuristic labels, active learning is applied to identify and correct false positive labels, and semi-supervised learning is applied to further use false negatives missed by labelling heuristics for learning. Extensive experiments on Twitter data showed that CDS achieved high classification accuracy.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2017.03.075