Loading…
Air quality data pre-processing: A novel algorithm to impute missing values in univariate time series
Missing values are ubiquitous in air pollution datasets as the data is being collected through sensors. Preprocessing these data plays a vital role in obtaining accurate results in the downstream analyses. This task becomes even more challenging as time is an implicit variable that cannot be ignored...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Missing values are ubiquitous in air pollution datasets as the data is being collected through sensors. Preprocessing these data plays a vital role in obtaining accurate results in the downstream analyses. This task becomes even more challenging as time is an implicit variable that cannot be ignored. Existing methods that deal with missing data in time series perform reasonably well in situations where the percentage of missing values is relatively low and the gap size is small. However, the need for the development of robust methods, particularly for large gaps, is still persistent. This paper proposes a novel algorithm (FBReg) to impute univariate air pollution variables by applying a bi-directional method based on regularized regression models. The performance of the method is evaluated against two baseline models, Mean imputation and Last observation carried forward (LOCF), as well as two well- established methods, Kalman smoothing on structural time series models and Kalman smoothing on ARIMA (Auto-Regressive Integrated Moving Average) models. The proposed algorithm outperforms the considered methods and exhibits consistent performance with exponentially distributed missing values under the MCAR (Missing Completely at Random) mechanism, as well as with large gaps. |
---|---|
ISSN: | 2375-0197 |
DOI: | 10.1109/ICTAI52525.2021.00159 |