Loading…
Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework
BACKGROUND: In healthcare domain, clinical trials generate time-stamped data that record set of observations on patient health status. These data are liable to missing values since there are situations, where the patient observations are neither done regularly nor updated correctly. OBJECTIVE: This...
Saved in:
Published in: | Computational statistics & data analysis 2017-08, Vol.112, p.63-79 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | BACKGROUND: In healthcare domain, clinical trials generate time-stamped data that record set of observations on patient health status. These data are liable to missing values since there are situations, where the patient observations are neither done regularly nor updated correctly.
OBJECTIVE: This paper aims to impute missing values in an unevenly spaced clinical time-series data by proposing a tolerance rough set induced bio-statistical (TRiBS) framework. The proposed framework adopts an inverse distance weight (IDW) interpolation technique and improves it using the concept of tolerance rough set (TR) and particle swarm optimization (PSO).
METHOD: To interpolate an unknown data point, the classical IDW interpolation suffers from two major drawbacks: first, in selecting the known data points and second, choosing an optimal influence factor. TRiBS framework overcomes the first limitation using TR and the second using PSO. TR derives the dependent attributes for each attribute using non-missing records. The nearest significant set is then generated for each missing value based on its attribute dependencies. The PSO technique fixes the weights for the data in a nearest significant set by finding an optimized influence factor. The obtained significant set and its influence factor are used in IDW computations to impute missing value.
RESULT: The proposed work is experimented using clinical time series dataset of hepatitis and thrombosis patients. However, the proposed system can support other clinical time series dataset with minor domain specific changes.
CONCLUSION: The performance of the imputed results proves the effectiveness of TRiBS. Experimental evaluation with the classifiers such as neural networks, support vector machine (SVM) and decision tree have shown an improvement in the classification accuracy when a missing data is pre-processed with the proposed framework.
•Imputing missing values in unevenly spaced clinical time series data.•Tolerance rough set induced bio-statistical (TRiBS) framework is used in imputation.•TRiBS adopts and improve inverse distance weight (IDW) interpolation technique.•TRiBS uses the tolerance rough set and particle swarm optimization to improve IDW.•Performance of the imputed results proves the effectiveness of TRiBS. |
---|---|
ISSN: | 0167-9473 1872-7352 |
DOI: | 10.1016/j.csda.2017.02.012 |