Loading…

Modeling Chinese Microblogs with Five Ws for Topic Hashtags Extraction

Hashtags are important metadata in microblogs and are used to mark topics or index messages. However,statistics show that hashtags are absent from most microblogs. This poses great challenges for the retrieval and analysis of these tagless microblogs. In this paper, we summarize the similarity betwe...

Full description

Saved in:
Bibliographic Details
Published in:Tsinghua science and technology 2017-04, Vol.22 (2), p.135-148
Main Authors: Zhao, Zhibin, Sun, Jiahong, Yao, Lan, Wang, Xun, Chu, Jiahong, Liu, Huan, Yu, Ge
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hashtags are important metadata in microblogs and are used to mark topics or index messages. However,statistics show that hashtags are absent from most microblogs. This poses great challenges for the retrieval and analysis of these tagless microblogs. In this paper, we summarize the similarity between microblogs and shortmessage-style news, and then propose an algorithm, named 5WTAG, for detecting microblog topics based on a model of five Ws(When, Where, Who, What, ho W). As five-W attributes are the core components in event description, it is guaranteed theoretically that 5WTAG can properly extract semantic topics from microblogs. We introduce the detailed procedure of the algorithm in this paper including spam microblog identification, microblog segmentation, and candidate hashtag construction. In addition, we propose a novel recommendation computing method for ranking candidate hashtags, which combines syntax and semantic analysis and observes the distribution of artificial topic hashtags. Finally, we conduct comprehensive experiments to verify the semantic correctness and completeness of the candidate hashtags, as well as the accuracy of the recommendation method using real data from Sina Weibo.
ISSN:1007-0214
1878-7606
1007-0214
DOI:10.23919/TST.2017.7889636