Loading…

Predicting sustainable arsenic mitigation using machine learning techniques

This study evaluates state-of-the-art machine learning models in predicting the most sustainable arsenic mitigation preference. A Gaussian distribution-based Naïve Bayes (NB) classifier scored the highest Area Under the Curve (AUC) of the Receiver Operating Characteristic curve (0.82), followed by N...

Full description

Saved in:
Bibliographic Details
Published in:Ecotoxicology and environmental safety 2022-03, Vol.232, p.113271-113271, Article 113271
Main Authors: Singh, Sushant K., Taylor, Robert W., Pradhan, Biswajeet, Shirzadi, Ataollah, Pham, Binh Thai
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study evaluates state-of-the-art machine learning models in predicting the most sustainable arsenic mitigation preference. A Gaussian distribution-based Naïve Bayes (NB) classifier scored the highest Area Under the Curve (AUC) of the Receiver Operating Characteristic curve (0.82), followed by Nu Support Vector Classification (0.80), and K-Neighbors (0.79). Ensemble classifiers scored higher than 70% AUC, with Random Forest being the top performer (0.77), and Decision Tree model ranked fourth with an AUC of 0.77. The multilayer perceptron model also achieved high performance (AUC=0.75). Most linear classifiers underperformed, with the Ridge classifier at the top (AUC=0.73) and perceptron at the bottom (AUC=0.57). A Bernoulli distribution-based Naïve Bayes classifier was the poorest model (AUC=0.50). The Gaussian NB was also the most robust ML model with the slightest variation of Kappa score on training (0.58) and test data (0.64). The results suggest that nonlinear or ensemble classifiers could more accurately understand the complex relationships of socio-environmental data and help develop accurate and robust prediction models of sustainable arsenic mitigation. Furthermore, Gaussian NB is the best option when data is scarce. [Display omitted] •GaussianNB produce robust sustainable arsenic mitigation prediction models.•Non-linear models best fit to model socio-economic and environmental data.•GaussianNB achieved 0.825 AUC score.•SGD, perceptron, and BernoulliNB could not produce robust sustainable arsenic mitigation prediction models.
ISSN:0147-6513
1090-2414
DOI:10.1016/j.ecoenv.2022.113271