Loading…

Stream Clustering and Visualization of Geotagged Text Data for Crisis Management

Social media and microblogging services produce vast amounts of streaming data, and one of the most important ways of analyzing and discovering interesting trends in streaming data is through clustering. When clustering streaming data, it is desirable to perform only a single pass over the incoming...

Full description

Saved in:
Bibliographic Details
Main Authors: Crossman, Nathaniel C., Chung, Soon M., Schmidt, Vincent A.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Social media and microblogging services produce vast amounts of streaming data, and one of the most important ways of analyzing and discovering interesting trends in streaming data is through clustering. When clustering streaming data, it is desirable to perform only a single pass over the incoming data, such that we don't need to process old data again, and the clustering model should evolve over time not to lose any important feature statistics of the data. In this research, we have developed a new clustering system that clusters social media data based on their textual content and displays the clusters and their locations on the map. It allows at-a-glance information to be displayed throughout the evolution of a crisis. Our system takes advantage of a text stream clustering algorithm called TextClust [4], which uses two-phase clustering, composed of micro-clustering and macro-clustering. The online micro-clustering phase incrementally creates micro-clusters that represent enough information about topics occurring in the text stream. The off-line macro-clustering phase clusters micro-clusters for a user-specified time interval. Our system allows users to change the macro-clustering algorithm interactively, in order to evaluate the micro-clustering results in a seamless manner and improve the overall clustering result. Our experiments demonstrated that the performance of our system is very scalable. Our system can be easily used by first responders and crisis management personnel to quickly determine if a crisis is happening, where it is concentrated, and what resources are best to deploy to the situation.
ISSN:2640-0227
DOI:10.1109/ICoDSE48700.2019.9092760