Loading…

Kernel-based features for predicting population health indices from geocoded social media data

When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet....

Full description

Saved in:
Bibliographic Details
Published in:Decision Support Systems 2017-10, Vol.102, p.22-31
Main Authors: Nguyen, Thin, Larsen, Mark E., O’Dea, Bridianne, Nguyen, Duc Thanh, Yearwood, John, Phung, Dinh, Venkatesh, Svetha, Christensen, Helen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. •Kernel-based textual features for social media data analysis•Population health prediction•Spatial decision support systems•Advanced cluster computing (Apache Spark)•Big geo-tagged data from Twitter
ISSN:0167-9236
1873-5797
DOI:10.1016/j.dss.2017.06.010