Loading…

A perspective on the Missing at Random problem: Synthetic Generation and Benchmark Analysis

Progressively more advanced and complex models are proposed to address problems related to computer vision, forecasting, Internet of Things, Big Data and so on. However, these disciplines require preprocessing steps to obtain meaningful results. One of the most common problems addressed in this stag...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2024-01, Vol.12, p.1-1
Main Authors: Cabrera-Sanchez, Juan-Francisco, Pereira, Ricardo Cardoso, Abreu, Pedro Henriques, Silva-Ramirez, Esther-Lydia
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Progressively more advanced and complex models are proposed to address problems related to computer vision, forecasting, Internet of Things, Big Data and so on. However, these disciplines require preprocessing steps to obtain meaningful results. One of the most common problems addressed in this stage is the presence of missing values. Understanding the reason why missingness occurs helps to select data imputation methods that are more adequate to complete these missing values. Missing at Random synthetic generation presents challenges such as achieving extreme missingness rates and preserving the consistency of the mechanism. To address these shortcomings, three new methods that generate synthetic missingness under the Missing at Random mechanism are proposed in this work and compared to a baseline model. This comparison considers a benchmark covering 33 data sets and five missingness rates (10%, 20%, 40%, 60%, 80%). Seven data imputation methods are compared to evaluate the proposals, ranging from traditional methods to deep learning methods. The results demonstrate that the proposals are aligned with the baseline method in terms of the performance and ranking of data imputation methods. Thus, three new feasible and consistent alternatives for synthetic missingness generation under Missing at Random are presented.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3490396