Loading…

Computation Time Optimization on Hashtag Segmentation for Social Media Data

Despite sentiment analysis or contextual mining of text that recognizes and extracts subjective information from a source, it is considered necessary to estimate human behavior. A hashtag is a metadata tag used to classify data into a category. However, there has been little discussion on segmenting...

Full description

Saved in:
Bibliographic Details
Main Authors: Halgamuge, Malka N., Caliskan, Huseyin, Mohammad, Azeem
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Despite sentiment analysis or contextual mining of text that recognizes and extracts subjective information from a source, it is considered necessary to estimate human behavior. A hashtag is a metadata tag used to classify data into a category. However, there has been little discussion on segmenting hashtags so far. We propose an algorithm to segment hashtags by optimizing computation time. We create candidates according to a given corpus, containing 1-gram (unigram) and 2-gram (bigram) data. The proposed algorithm allows a reduction in the computation time of generating segments by limiting the candidates in a given corpus. The fewer candidates there are, the shorter the calculation is, leading to a decreased duration. In this study, we gather food-related unstructured tweets (N = 951,255) from Twitter. Our results demonstrate that the proposed algorithm allows a computation time reduction of 29.7%. However, if the segment could not be found with the proposed algorithm, the original method for hashtag segmentation, which includes identifying all possible candidates, is used as a fallback method. The proposed approach improves the hashtag segmentation technique, minimizing computation time, which could be utilized in real-time tweet analysis. The result of our study shows that the trend of sentiments for both raw data and segmented data is similar, which also verifies the method's accuracy. This study's discoveries uncover that, despite the fact that computers are getting faster, computational resources should be utilized effectively. Our work also provides a data collection model for future surveys, which could also shorten the data retrieval process with multi-threading programming concepts.
ISSN:1558-2612
DOI:10.1109/WCNC49053.2021.9417569