Loading…
Comparison of the performance of multiple imputation models in filling gaps in hourly and daily meteorological series from two locations in the state of São Paulo-Brazil
The presence of missing values (missings) in data series is a common issue that needs to be adequately addressed to ensure the validity of certain statistical methods and, in turn, to minimize biases that might affect study outcomes and conclusions. Various methods can be applied depending on the da...
Saved in:
Published in: | Modeling earth systems and environment 2024-04, Vol.10 (2), p.1815-1823 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The presence of missing values (missings) in data series is a common issue that needs to be adequately addressed to ensure the validity of certain statistical methods and, in turn, to minimize biases that might affect study outcomes and conclusions. Various methods can be applied depending on the dataset characteristics and the amount of data lost. This study aimed to evaluate the performance of internal multiple imputation approaches, 'pmm' and 'midastouch,' for sets of meteorological variables with daily and hourly frequencies. The first set was collected in the municipality of Botucatu, and the second in Tupã, both in São Paulo State, Brazil. These datasets comprise information on global solar radiation, wind speed, air temperature, maximum air temperature, minimum air temperature, relative air humidity, maximum relative humidity, and minimum relative humidity for the period from March 20, 2018, to March 19, 2021, gathered by the São Paulo State University - UNESP (Botucatu–SP) and the Brazilian Institute of Meteorology–INMET (Tupã–SP). Analysis of the missing values revealed that the time series from Botucatu–SP had 1.4% data loss, whereas Tupã–SP had 7%. Given the amount of missing data, imputation was performed using the 'pmm' and 'midastouch' methods, implemented through the R software. Results indicate that both procedures offer satisfactory performance in imputing values for continuous variables, with superior performance for hourly frequency data. The greater level of detail in hourly data enables a better understanding of the associated nuances and uncertainties. |
---|---|
ISSN: | 2363-6203 2363-6211 |
DOI: | 10.1007/s40808-023-01863-7 |