Loading…
Developing a framework for classifying water lead levels at private drinking water systems: A Bayesian Belief Network approach
•Framework fuses discretization, feature selection, and Bayes classifiers.•Bayesian Belief Network classifies lead levels in drinking water above 15 ppb.•Applied for dataset collected at private drinking water systems in Virginia.•Features include plumbing, treatment, water quality, and perceptions....
Saved in:
Published in: | Water research (Oxford) 2021-02, Vol.189, p.116641, Article 116641 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Framework fuses discretization, feature selection, and Bayes classifiers.•Bayesian Belief Network classifies lead levels in drinking water above 15 ppb.•Applied for dataset collected at private drinking water systems in Virginia.•Features include plumbing, treatment, water quality, and perceptions.•Naïve Bayes model classifies lead with 81.8% accuracy and 47.6% recall.
The presence of lead in drinking water creates a public health crisis, as lead causes neurological damage at low levels of exposure. The objective of this research is to explore modeling approaches to predict the risk of lead at private drinking water systems. This research uses Bayesian Network approaches to explore interactions among household characteristics, geological parameters, observations of tap water, and laboratory tests of water quality parameters. A knowledge discovery framework is developed by integrating methods for data discretization, feature selection, and Bayes classifiers. Forward selection and backward selection are explored for feature selection. Discretization approaches, including domain-knowledge, statistical, and information-based approaches, are tested to discretize continuous features. Bayes classifiers that are tested include General Bayesian Network, Naive Bayes, and Tree-Augmented Naive Bayes, which are applied to identify Directed Acyclic Graphs (DAGs). Bayesian inference is used to fit conditional probability tables for each DAG. The Bayesian framework is applied to fit models for a dataset collected by the Virginia Household Water Quality Program (VAHWQP), which collected water samples and conducted household surveys at 2,146 households that use private water systems, including wells and springs, in Virginia during 2012 and 2013. Relationships among laboratory-tested water quality parameters, observations of tap water, and household characteristics, including plumbing type, source water, household location, and on-site water treatment are explored to develop features for predicting water lead levels. Results demonstrate that Naive Bayes classifiers perform best based on recall and precision, when compared with other classifiers. Copper is the most significant predictor of lead, and other important predictors include county, pH, and on-site water treatment. Feature selection methods have a marginal effect on performance, and discretization methods can greatly affect model performance when paired with classifiers. Owners of private wells remain disadvantaged |
---|---|
ISSN: | 0043-1354 1879-2448 |
DOI: | 10.1016/j.watres.2020.116641 |