Loading…

A modified machine learning algorithm for multi-collinearity environmental data

Air pollution is defined as an adverse event that negatively affects ecosystems and standard conditions necessary for human survival and progress, manifested by certain substances in the atmosphere exceeding specific concentration levels. The control of air pollution is a significant strategic task...

Full description

Saved in:
Bibliographic Details
Published in:Environmental and ecological statistics 2024-12, Vol.31 (4), p.1063-1083
Main Authors: Tian, Haitao, Huang, Lei, Hu, Shouri, Wu, Wangqi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Air pollution is defined as an adverse event that negatively affects ecosystems and standard conditions necessary for human survival and progress, manifested by certain substances in the atmosphere exceeding specific concentration levels. The control of air pollution is a significant strategic task related to the national economies and the well-being of the people. In the face of increasingly severe environmental pollution problems, accurately predicting air pollution indicators becomes crucial. Among the popular air pollution prediction methods, the K-nearest neighbors (KNN) appears to be one of most promising approaches. In this paper, we develop a novel KNN rule that combines the ridge estimators called KNN-ridge regression (KNN-RR). The proposed KNN-RR is motivated by the sensitivity problem that multi-collinearity exists in the current KNN regression, aiming to enhance the prediction performance. Our theoretical result shows that under some mild assumptions, there exists a penalty parameter such that the mean square prediction error of ridge regression is smaller than that of ordinary least square regression. We examine the empirical performances of KNN-RR and other methods on real-world datasets, such as the AQI and PM2.5 prediction, and the results indicate that our method has some advantages in improving prediction accuracy. To a certain extent, this paper paves a new way to improve some supervised machine learning methods.
ISSN:1352-8505
1573-3009
DOI:10.1007/s10651-024-00634-6