Loading…
A modified machine learning algorithm for multi-collinearity environmental data
Air pollution is defined as an adverse event that negatively affects ecosystems and standard conditions necessary for human survival and progress, manifested by certain substances in the atmosphere exceeding specific concentration levels. The control of air pollution is a significant strategic task...
Saved in:
Published in: | Environmental and ecological statistics 2024-12, Vol.31 (4), p.1063-1083 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Air pollution is defined as an adverse event that negatively affects ecosystems and standard conditions necessary for human survival and progress, manifested by certain substances in the atmosphere exceeding specific concentration levels. The control of air pollution is a significant strategic task related to the national economies and the well-being of the people. In the face of increasingly severe environmental pollution problems, accurately predicting air pollution indicators becomes crucial. Among the popular air pollution prediction methods, the K-nearest neighbors (KNN) appears to be one of most promising approaches. In this paper, we develop a novel KNN rule that combines the ridge estimators called KNN-ridge regression (KNN-RR). The proposed KNN-RR is motivated by the sensitivity problem that multi-collinearity exists in the current KNN regression, aiming to enhance the prediction performance. Our theoretical result shows that under some mild assumptions, there exists a penalty parameter such that the mean square prediction error of ridge regression is smaller than that of ordinary least square regression. We examine the empirical performances of KNN-RR and other methods on real-world datasets, such as the AQI and PM2.5 prediction, and the results indicate that our method has some advantages in improving prediction accuracy. To a certain extent, this paper paves a new way to improve some supervised machine learning methods. |
---|---|
ISSN: | 1352-8505 1573-3009 |
DOI: | 10.1007/s10651-024-00634-6 |