Loading…
Stream Clustering and Visualization of Geotagged Text Data for Crisis Management
Social media and microblogging services produce vast amounts of streaming data, and one of the most important ways of analyzing and discovering interesting trends in streaming data is through clustering. When clustering streaming data, it is desirable to perform only a single pass over the incoming...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Social media and microblogging services produce vast amounts of streaming data, and one of the most important ways of analyzing and discovering interesting trends in streaming data is through clustering. When clustering streaming data, it is desirable to perform only a single pass over the incoming data, such that we don't need to process old data again, and the clustering model should evolve over time not to lose any important feature statistics of the data. In this research, we have developed a new clustering system that clusters social media data based on their textual content and displays the clusters and their locations on the map. It allows at-a-glance information to be displayed throughout the evolution of a crisis. Our system takes advantage of a text stream clustering algorithm called TextClust [4], which uses two-phase clustering, composed of micro-clustering and macro-clustering. The online micro-clustering phase incrementally creates micro-clusters that represent enough information about topics occurring in the text stream. The off-line macro-clustering phase clusters micro-clusters for a user-specified time interval. Our system allows users to change the macro-clustering algorithm interactively, in order to evaluate the micro-clustering results in a seamless manner and improve the overall clustering result. Our experiments demonstrated that the performance of our system is very scalable. Our system can be easily used by first responders and crisis management personnel to quickly determine if a crisis is happening, where it is concentrated, and what resources are best to deploy to the situation. |
---|---|
ISSN: | 2640-0227 |
DOI: | 10.1109/ICoDSE48700.2019.9092760 |