Loading…

Comparison of the performance of multiple imputation models in filling gaps in hourly and daily meteorological series from two locations in the state of São Paulo-Brazil

The presence of missing values (missings) in data series is a common issue that needs to be adequately addressed to ensure the validity of certain statistical methods and, in turn, to minimize biases that might affect study outcomes and conclusions. Various methods can be applied depending on the da...

Full description

Saved in:
Bibliographic Details
Published in:Modeling earth systems and environment 2024-04, Vol.10 (2), p.1815-1823
Main Authors: Maziero, Luana Possari, Rodrigues, Sérgio Augusto, Pai, Alexandre Dal, Cremasco, Camila Pires, Gabriel Filho, Luís Roberto Almeida
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The presence of missing values (missings) in data series is a common issue that needs to be adequately addressed to ensure the validity of certain statistical methods and, in turn, to minimize biases that might affect study outcomes and conclusions. Various methods can be applied depending on the dataset characteristics and the amount of data lost. This study aimed to evaluate the performance of internal multiple imputation approaches, 'pmm' and 'midastouch,' for sets of meteorological variables with daily and hourly frequencies. The first set was collected in the municipality of Botucatu, and the second in Tupã, both in São Paulo State, Brazil. These datasets comprise information on global solar radiation, wind speed, air temperature, maximum air temperature, minimum air temperature, relative air humidity, maximum relative humidity, and minimum relative humidity for the period from March 20, 2018, to March 19, 2021, gathered by the São Paulo State University - UNESP (Botucatu–SP) and the Brazilian Institute of Meteorology–INMET (Tupã–SP). Analysis of the missing values revealed that the time series from Botucatu–SP had 1.4% data loss, whereas Tupã–SP had 7%. Given the amount of missing data, imputation was performed using the 'pmm' and 'midastouch' methods, implemented through the R software. Results indicate that both procedures offer satisfactory performance in imputing values for continuous variables, with superior performance for hourly frequency data. The greater level of detail in hourly data enables a better understanding of the associated nuances and uncertainties.
ISSN:2363-6203
2363-6211
DOI:10.1007/s40808-023-01863-7