Loading…
Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation
Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity...
Saved in:
Published in: | International journal of environmental research and public health 2021-08, Vol.18 (16), p.8375 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed. |
---|---|
ISSN: | 1660-4601 1661-7827 1660-4601 |
DOI: | 10.3390/ijerph18168375 |