Loading…

Meta ensemble learning in geospatial sentiment analysis and community survey mapping: a water supply case study

Amidst the proliferation of social media and online platforms, sentiment analysis stands out as a pivotal tool in Natural Language Processing (NLP), facilitating the categorization of public opinions. The overarching goal of this study is to apply sentiment analysis techniques to assess public perce...

Full description

Saved in:
Bibliographic Details
Published in:Earth science informatics 2024-08, Vol.17 (4), p.3233-3252
Main Author: Vahidnia, Mohammad H.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Amidst the proliferation of social media and online platforms, sentiment analysis stands out as a pivotal tool in Natural Language Processing (NLP), facilitating the categorization of public opinions. The overarching goal of this study is to apply sentiment analysis techniques to assess public perceptions of water supply quality and provide decision-support maps for infrastructure planning. The primary research gap addressed in this study concerns the efficacious integration of spatial statistics methods with sentiment analysis for the purpose of generating zoning maps. This integration, offers a novel approach for understanding public perceptions and sentiments within specific geographical contexts. Sub-objectives of the study include aspects such as the development of a robust meta ensemble learning framework, the utilization of crowdsourced geographic information for sentiment analysis, and the evaluation of text mining techniques specific to water supply concerns. Our approach utilizes comments from subscribers of the Water Organization portal. The meta ensemble learning framework comprises six different combinations, including boosting, bagging, and voting solutions, drawing from various base estimators such as K-Nearest Neighbors (KNN), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), and Artificial Neural Network (ANN), alongside boosting techniques like AdaBoost and XGBoost. Results indicate that aggregating vectors from text feature extraction techniques such as Bag of Words (BoW), N-gram, and TF-IDF yielded optimal pattern recognition. AdaBoost emerged as the most effective model, as determined by metrics like Accuracy, F1-score, and AUC. Unreviewed subscriber comments were fed into the final model to predict unfavorable remarks, subsequently visualized on georeferenced maps. Geostatistical methods within Geographic Information Systems (GIS) were employed, including spatial kernel density, spatial join, natural breaks classification, and hotspot analysis using Getis-Ord Gi* statistics. The approach produced maps illustrating areas with a high density of negative remarks, identifying problematic urban blocks and continuous hotspot areas. Overall, our method demonstrates promising efficiency in assessing water supply situations and informing development planning.
ISSN:1865-0473
1865-0481
DOI:10.1007/s12145-024-01345-z