Loading…
Novel Data Imputation for Multiple Types of Missing Data in Intensive Care Units
The diversity and number of parameters monitored in an intensive care unit (ICU) make the resulting databases highly susceptible to quality issues, such as missing information and erroneous data entry, which adversely affect the downstream processing and predictive modeling. Missing data interpolati...
Saved in:
Published in: | IEEE journal of biomedical and health informatics 2019-05, Vol.23 (3), p.1243-1250 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The diversity and number of parameters monitored in an intensive care unit (ICU) make the resulting databases highly susceptible to quality issues, such as missing information and erroneous data entry, which adversely affect the downstream processing and predictive modeling. Missing data interpolation and imputation techniques, such as multiple imputation, expectation maximization, and hot-deck imputation techniques do not account for the type of missing data, which can lead to bias. In our study, we first model the missing data as three types: "neglectable" also known as a.k.a "missing completely at random," "recoverable" a.k.a. "missing at random," and "not easily recoverable" a.k.a. "missing not at random." We then design imputation techniques for each type of missing data. We use a publicly available database (MIMIC II) to demonstrate how these imputations perform with random forests for prediction. Our results indicate that these novel imputation techniques outperformed standard mean filling techniques and expectation maximization with a statistical significance p |
---|---|
ISSN: | 2168-2194 2168-2208 |
DOI: | 10.1109/JBHI.2018.2883606 |