Loading…

Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data

Missing data can pose severe consequences in critical contexts, such as clinical research based on routinely collected healthcare data. This issue is usually handled with imputation strategies, but these tend to produce poor and biased results under the Missing Not At Random (MNAR) mechanism. A rece...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal of biomedical and health informatics 2022-08, Vol.26 (8), p.4218-4227
Main Authors:	Pereira, Ricardo Cardoso, Abreu, Pedro Henriques, Rodrigues, Pedro Pereira
Format:	Article
Language:	English
Subjects:	Congestive heart failure Data models Datasets Health care Healthcare data Mathematical models Medical services Mice Missing data missing not at random Neural networks partial multiple imputation Principal component analysis Task analysis variational autoencoder
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Missing data can pose severe consequences in critical contexts, such as clinical research based on routinely collected healthcare data. This issue is usually handled with imputation strategies, but these tend to produce poor and biased results under the Missing Not At Random (MNAR) mechanism. A recent trend that has been showing promising results for MNAR is the use of generative models, particularly Variational Autoencoders. However, they have a limitation: the imputed values are the result of a single sample, which can be biased. To tackle it, an extension to the Variational Autoencoder that uses a partial multiple imputation procedure is introduced in this work. The proposed method was compared to 8 state-of-the-art imputation strategies, in an experimental setup with 34 datasets from the medical context, injected with the MNAR mechanism (10% to 80% rates). The results were evaluated through the Mean Absolute Error, with the new method being the overall best in 71% of the datasets, significantly outperforming the remaining ones, particularly for high missing rates. Finally, a case study of a classification task with heart failure data was also conducted, where this method induced improvements in 50% of the classifiers.
ISSN:	2168-2194 2168-2208
DOI:	10.1109/JBHI.2022.3172656