Loading…

Emerging topic detection in twitter stream based on high utility pattern mining

•We propose a topic detection method for Twitter using High Utility Pattern Mining.•For every chunk of tweets, the minimum utility threshold is determined dynamically.•A method using a Topic-tree is presented for postprocessing after applying HUPM.•The proposed method shows superior topic recall per...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2019-01, Vol.115, p.27-36
Main Authors: Choi, Hyeok-Jun, Park, Cheong Hee
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We propose a topic detection method for Twitter using High Utility Pattern Mining.•For every chunk of tweets, the minimum utility threshold is determined dynamically.•A method using a Topic-tree is presented for postprocessing after applying HUPM.•The proposed method shows superior topic recall performance than other methods.•The proposed method has short running time suitable for real-time application. Among internet and smart device applications, Twitter has become a leading social media platform, disseminating online events occurring in the world on a real-time basis. Many studies have been conducted to identify valuable information on Twitter. Recently, Frequent Pattern Mining has been applied for topic detection on Twitter. In Frequent Pattern Mining, a topic is considered to be a group of words that appear simultaneously, however, the method only considers the frequency of words, and their utility for topic detection is not considered in the process of pattern generation. In this paper, we propose a method to detect emerging topics on Twitter based on High Utility Pattern Mining (HUPM), which takes frequency and utility into account at the same time. For a chunk of tweets by time-based windowing on the Twitter stream, we define the utility of words based on the growth rate in frequency and find groups of words with high frequency and high utility by HUPM. For post-processing to extract actual topic patterns from candidate topic patterns generated by HUPM, an efficient data structure called Topic-tree (TP-Tree) is also proposed. Experimental results demonstrated the effectiveness of the proposed method, which showed superior performance and shorter running time than other tested topic detection methods. In particular, the proposed method showed a 5% higher topic recall than the other compared methods for the three datasets used.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2018.07.051