Loading…

Multi-Modal Stacked Denoising Autoencoder for Handling Missing Data in Healthcare Big Data

Supply and demand increase in response to healthcare trends. Moreover, personal health records (PHRs) are being managed by individuals. Such records are collected using different avenues and vary considerably in terms of their type and scope depending on the particular circumstances. As a result, so...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2020, Vol.8, p.104933-104943
Main Authors: Kim, Joo-Chang, Chung, Kyungyong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Supply and demand increase in response to healthcare trends. Moreover, personal health records (PHRs) are being managed by individuals. Such records are collected using different avenues and vary considerably in terms of their type and scope depending on the particular circumstances. As a result, some data may be missing, which has a negative effect on the data analysis, and such data should, therefore, be replaced with appropriate values. In this study, a method for estimating missing data using a multi-modal autoencoder applied to the field of healthcare big data is proposed. The proposed method uses a stacked denoising autoencoder to estimate the missing data that occur during the data collection and processing stages. Autoencoders are neural networks that output value of x ^ similar to an input value of x. In the present study, data from the Korean National Health Nutrition Examination Survey (KNHNES), conducted by the Korea Centers for Disease Control and Prevention (KCDC), are used. As representative healthcare data from South Korea, they contain a large number of parameters identical to those used in the PHRs. Based on this, models can be generated to estimate missing data occurring in PHRs. Furthermore, PHRs involve a multi-modality that allows the data to be collected from multiple sources for a single object. Therefore, the stacked denoising autoencoder applied is configured under a multi-modal setting. Through pre-processing, a set of data without missing value in KNHNES is designed. In the data set based learning, a label is set as original data, and an autoencoder input is set as noised input that additionally has as many random zero numbers as noise factor. In this way, the autoencoder learns in the way of making the zero-based noise value similar to the original label value. When the amount of missing data in a dataset reaches approximately 25%, the accuracy of the proposed method using a multi-modal stacked denoising autoencoder is 0.9217, which is higher than that achieved by other ordinary methods. For a single-modal denoising autoencoder, the accuracy is 0.932, with a slight difference of approximately 0.01, which falls within the allowable limits in data analysis. In terms of computational performance, a single-modal autoencoder has 10,384 parameters, which is 5,594 more than those used in a multi-modal stacked autoencoder. These parameters affect the speed of the model. Both models exhibit a significant difference in the number of para
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2997255