Loading…

Air quality data pre-processing: A novel algorithm to impute missing values in univariate time series

Missing values are ubiquitous in air pollution datasets as the data is being collected through sensors. Preprocessing these data plays a vital role in obtaining accurate results in the downstream analyses. This task becomes even more challenging as time is an implicit variable that cannot be ignored...

Full description

Saved in:
Bibliographic Details
Main Authors: Wijesekara, Lakmini, Liyanage, Liwan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Missing values are ubiquitous in air pollution datasets as the data is being collected through sensors. Preprocessing these data plays a vital role in obtaining accurate results in the downstream analyses. This task becomes even more challenging as time is an implicit variable that cannot be ignored. Existing methods that deal with missing data in time series perform reasonably well in situations where the percentage of missing values is relatively low and the gap size is small. However, the need for the development of robust methods, particularly for large gaps, is still persistent. This paper proposes a novel algorithm (FBReg) to impute univariate air pollution variables by applying a bi-directional method based on regularized regression models. The performance of the method is evaluated against two baseline models, Mean imputation and Last observation carried forward (LOCF), as well as two well- established methods, Kalman smoothing on structural time series models and Kalman smoothing on ARIMA (Auto-Regressive Integrated Moving Average) models. The proposed algorithm outperforms the considered methods and exhibits consistent performance with exponentially distributed missing values under the MCAR (Missing Completely at Random) mechanism, as well as with large gaps.
ISSN:2375-0197
DOI:10.1109/ICTAI52525.2021.00159