Loading…

Identification of organic chemical indicators for tracking pollution sources in groundwater by machine learning from GC-HRMS-based suspect and non-target screening data

•252 chemicals were identified by SNTS in groundwater from four regions with diverse contamination histories.•A novel and robust systematic machine learning-based workflow for predicting chemical indicators was proposed.•The proposed workflow showed good predictive ability (Q2) of 0.897.•51 chemical...

Full description

Saved in:
Bibliographic Details
Published in:Water research (Oxford) 2024-03, Vol.252, p.121130-121130, Article 121130
Main Authors: Ekpe, Okon Dominic, Choo, Gyojin, Kang, Jin-Kyu, Yun, Seong-Taek, Oh, Jeong-Eun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•252 chemicals were identified by SNTS in groundwater from four regions with diverse contamination histories.•A novel and robust systematic machine learning-based workflow for predicting chemical indicators was proposed.•The proposed workflow showed good predictive ability (Q2) of 0.897.•51 chemical indicators for tracking groundwater contamination sources were suggested. In this study, the strong analytical power of gas chromatography coupled to a high resolution mass spectrometry (GC-HRMS) in suspect and non-target screening (SNTS) of organic micropollutants was combined with machine learning tools for proposing a novel and robust systematic environmental forensics workflow, focusing on groundwater contamination. Groundwater samples were collected from four different regions with diverse contamination histories (namely oil [OC], agricultural [AGR], industrial [IND], and landfill [LF]), and a total of 252 organic micropollutants were identified, including pharmaceuticals, personal care products, pesticides, polycyclic aromatic hydrocarbons, plasticizers, phenols, organophosphate flame retardants, transformation products, and others, with detection frequencies ranging from 3 % to 100 %. Amongst the SNTS identified compounds, a total of 51 chemical indicators (i.e., OC: 13, LF: 12, AGR: 19, IND: 7) which included level 1 and 2 SNTS identified chemicals were pinpointed across all sampling regions by integrating a bootstrapped feature selection method involving the bootfs algorithm and a partial least squares discriminant analysis (PLS-DA) model to determine potential prevalent contamination sources. The proposed workflow showed good predictive ability (Q2) of 0.897, and the suggested contamination sources were gasoline, diesel, and/or other light petroleum products for the OC region, anthropogenic activities for the LF region, agricultural and human activities for the AGR region, and industrial/human activities for the IND region. These results suggest that the proposed workflow can select a subset of the most diagnostic features in the chemical space that can best distinguish a specific contamination source class. [Display omitted]
ISSN:0043-1354
1879-2448
DOI:10.1016/j.watres.2024.121130