Loading…

Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater

Accurately assessing and managing risks associated with inorganic pollutants in groundwater is imperative. Historic water quality databases are often sparse due to rationale or financial budgets for sample collection and analysis, posing challenges in evaluating exposure or water treatment effective...

Full description

Saved in:
Bibliographic Details
Published in:Environmental science & technology 2024-11, Vol.58 (46), p.20513-20524
Main Authors: Mahmood, Akhlak U., Islam, Minhazul, Gulyuk, Alexey V., Briese, Emily, Velasco, Carmen A., Malu, Mohit, Sharma, Naushita, Spanias, Andreas, Yingling, Yaroslava G., Westerhoff, Paul
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-a314t-8443393010103b3cc38a14e74c0453526d8f0e7f90e1f84e8c98d2ecd09f1dd83
container_end_page 20524
container_issue 46
container_start_page 20513
container_title Environmental science & technology
container_volume 58
creator Mahmood, Akhlak U.
Islam, Minhazul
Gulyuk, Alexey V.
Briese, Emily
Velasco, Carmen A.
Malu, Mohit
Sharma, Naushita
Spanias, Andreas
Yingling, Yaroslava G.
Westerhoff, Paul
description Accurately assessing and managing risks associated with inorganic pollutants in groundwater is imperative. Historic water quality databases are often sparse due to rationale or financial budgets for sample collection and analysis, posing challenges in evaluating exposure or water treatment effectiveness. We utilized and compared two advanced multiple data imputation techniques, AMELIA and MICE algorithms, to fill gaps in sparse groundwater quality data sets. AMELIA outperformed MICE in handling missing values, as MICE tended to overestimate certain values, resulting in more outliers. Field data sets revealed that 75% to 80% of samples exhibited no co-occurring regulated pollutants surpassing MCL values, whereas imputed values showed only 15% to 55% of the samples posed no health risks. Imputed data unveiled a significant increase, ranging from 2 to 5 times, in the number of sampling locations predicted to potentially exceed health-based limits and identified samples where 2 to 6 co-occurring chemicals may occur and surpass health-based levels. Linking imputed data to sampling locations can pinpoint potential hotspots of elevated chemical levels and guide optimal resource allocation for additional field sampling and chemical analysis. With this approach, further analysis of complete data sets allows state agencies authorized to conduct groundwater monitoring, often with limited financial resources, to prioritize sampling locations and chemicals to be tested. Given existing data and time constraints, it is crucial to identify the most strategic use of the available resources to address data gaps effectively. This work establishes a framework to enhance the beneficial impact of funding groundwater data collection by reducing uncertainty in prioritizing future sampling locations and chemical analyses.
doi_str_mv 10.1021/acs.est.4c05203
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11580165</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3128318968</sourcerecordid><originalsourceid>FETCH-LOGICAL-a314t-8443393010103b3cc38a14e74c0453526d8f0e7f90e1f84e8c98d2ecd09f1dd83</originalsourceid><addsrcrecordid>eNp1kUFrGzEQhUVoSNy059yCjoWwzmi162hPwThtakgIhBR6E7I0ayvZlVxJm-D--srYDe0h6DAgvfdGMx8hpwzGDEp2oXQcY0zjSkNdAj8gI5ZrUYuafSAjAMaLhk9-HpOPMT4BQMlBHJFj3tTQ8ApG5Pfd0CW77pBeq6TovF8PSSXrHb3DtPIm0ql5UU4jfbDxmU6d6jbRRqqcoY8Bs2dhO5s21Ld05guv9RCCdUs6dz4slbOazlbYW626SK2jN8EPzryqhOETOWzzLX7e1xPy49vXx9n34vb-Zj6b3haKsyoVoqo4bziwfPiCa82FYhVe5pGrmtflxIgW8LJtAFkrKhS6EaZEbaBpmTGCn5CrXe56WPRoNLoUVCfXwfYqbKRXVv7_4uxKLv2LZKwWwCZ1TviyTwj-15D3LXsbNXadcuiHKDkrBWeimWybXeykOvgYA7ZvfRjILTKZkcltxB5Zdpz9-703_V9GWXC-E2ydT34ImUF8N-4PIImkMA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3128318968</pqid></control><display><type>article</type><title>Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater</title><source>American Chemical Society:Jisc Collections:American Chemical Society Read &amp; Publish Agreement 2022-2024 (Reading list)</source><creator>Mahmood, Akhlak U. ; Islam, Minhazul ; Gulyuk, Alexey V. ; Briese, Emily ; Velasco, Carmen A. ; Malu, Mohit ; Sharma, Naushita ; Spanias, Andreas ; Yingling, Yaroslava G. ; Westerhoff, Paul</creator><creatorcontrib>Mahmood, Akhlak U. ; Islam, Minhazul ; Gulyuk, Alexey V. ; Briese, Emily ; Velasco, Carmen A. ; Malu, Mohit ; Sharma, Naushita ; Spanias, Andreas ; Yingling, Yaroslava G. ; Westerhoff, Paul</creatorcontrib><description>Accurately assessing and managing risks associated with inorganic pollutants in groundwater is imperative. Historic water quality databases are often sparse due to rationale or financial budgets for sample collection and analysis, posing challenges in evaluating exposure or water treatment effectiveness. We utilized and compared two advanced multiple data imputation techniques, AMELIA and MICE algorithms, to fill gaps in sparse groundwater quality data sets. AMELIA outperformed MICE in handling missing values, as MICE tended to overestimate certain values, resulting in more outliers. Field data sets revealed that 75% to 80% of samples exhibited no co-occurring regulated pollutants surpassing MCL values, whereas imputed values showed only 15% to 55% of the samples posed no health risks. Imputed data unveiled a significant increase, ranging from 2 to 5 times, in the number of sampling locations predicted to potentially exceed health-based limits and identified samples where 2 to 6 co-occurring chemicals may occur and surpass health-based levels. Linking imputed data to sampling locations can pinpoint potential hotspots of elevated chemical levels and guide optimal resource allocation for additional field sampling and chemical analysis. With this approach, further analysis of complete data sets allows state agencies authorized to conduct groundwater monitoring, often with limited financial resources, to prioritize sampling locations and chemicals to be tested. Given existing data and time constraints, it is crucial to identify the most strategic use of the available resources to address data gaps effectively. This work establishes a framework to enhance the beneficial impact of funding groundwater data collection by reducing uncertainty in prioritizing future sampling locations and chemical analyses.</description><identifier>ISSN: 0013-936X</identifier><identifier>ISSN: 1520-5851</identifier><identifier>EISSN: 1520-5851</identifier><identifier>DOI: 10.1021/acs.est.4c05203</identifier><identifier>PMID: 39509340</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Data Science</subject><ispartof>Environmental science &amp; technology, 2024-11, Vol.58 (46), p.20513-20524</ispartof><rights>2024 The Authors. Published by American Chemical Society</rights><rights>2024 The Authors. Published by American Chemical Society 2024 The Authors</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a314t-8443393010103b3cc38a14e74c0453526d8f0e7f90e1f84e8c98d2ecd09f1dd83</cites><orcidid>0000-0002-5607-2885 ; 0000-0002-9241-8759 ; 0000-0002-9924-8713 ; 0000-0002-8557-9992</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39509340$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mahmood, Akhlak U.</creatorcontrib><creatorcontrib>Islam, Minhazul</creatorcontrib><creatorcontrib>Gulyuk, Alexey V.</creatorcontrib><creatorcontrib>Briese, Emily</creatorcontrib><creatorcontrib>Velasco, Carmen A.</creatorcontrib><creatorcontrib>Malu, Mohit</creatorcontrib><creatorcontrib>Sharma, Naushita</creatorcontrib><creatorcontrib>Spanias, Andreas</creatorcontrib><creatorcontrib>Yingling, Yaroslava G.</creatorcontrib><creatorcontrib>Westerhoff, Paul</creatorcontrib><title>Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater</title><title>Environmental science &amp; technology</title><addtitle>Environ. Sci. Technol</addtitle><description>Accurately assessing and managing risks associated with inorganic pollutants in groundwater is imperative. Historic water quality databases are often sparse due to rationale or financial budgets for sample collection and analysis, posing challenges in evaluating exposure or water treatment effectiveness. We utilized and compared two advanced multiple data imputation techniques, AMELIA and MICE algorithms, to fill gaps in sparse groundwater quality data sets. AMELIA outperformed MICE in handling missing values, as MICE tended to overestimate certain values, resulting in more outliers. Field data sets revealed that 75% to 80% of samples exhibited no co-occurring regulated pollutants surpassing MCL values, whereas imputed values showed only 15% to 55% of the samples posed no health risks. Imputed data unveiled a significant increase, ranging from 2 to 5 times, in the number of sampling locations predicted to potentially exceed health-based limits and identified samples where 2 to 6 co-occurring chemicals may occur and surpass health-based levels. Linking imputed data to sampling locations can pinpoint potential hotspots of elevated chemical levels and guide optimal resource allocation for additional field sampling and chemical analysis. With this approach, further analysis of complete data sets allows state agencies authorized to conduct groundwater monitoring, often with limited financial resources, to prioritize sampling locations and chemicals to be tested. Given existing data and time constraints, it is crucial to identify the most strategic use of the available resources to address data gaps effectively. This work establishes a framework to enhance the beneficial impact of funding groundwater data collection by reducing uncertainty in prioritizing future sampling locations and chemical analyses.</description><subject>Data Science</subject><issn>0013-936X</issn><issn>1520-5851</issn><issn>1520-5851</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kUFrGzEQhUVoSNy059yCjoWwzmi162hPwThtakgIhBR6E7I0ayvZlVxJm-D--srYDe0h6DAgvfdGMx8hpwzGDEp2oXQcY0zjSkNdAj8gI5ZrUYuafSAjAMaLhk9-HpOPMT4BQMlBHJFj3tTQ8ApG5Pfd0CW77pBeq6TovF8PSSXrHb3DtPIm0ql5UU4jfbDxmU6d6jbRRqqcoY8Bs2dhO5s21Ld05guv9RCCdUs6dz4slbOazlbYW626SK2jN8EPzryqhOETOWzzLX7e1xPy49vXx9n34vb-Zj6b3haKsyoVoqo4bziwfPiCa82FYhVe5pGrmtflxIgW8LJtAFkrKhS6EaZEbaBpmTGCn5CrXe56WPRoNLoUVCfXwfYqbKRXVv7_4uxKLv2LZKwWwCZ1TviyTwj-15D3LXsbNXadcuiHKDkrBWeimWybXeykOvgYA7ZvfRjILTKZkcltxB5Zdpz9-703_V9GWXC-E2ydT34ImUF8N-4PIImkMA</recordid><startdate>20241119</startdate><enddate>20241119</enddate><creator>Mahmood, Akhlak U.</creator><creator>Islam, Minhazul</creator><creator>Gulyuk, Alexey V.</creator><creator>Briese, Emily</creator><creator>Velasco, Carmen A.</creator><creator>Malu, Mohit</creator><creator>Sharma, Naushita</creator><creator>Spanias, Andreas</creator><creator>Yingling, Yaroslava G.</creator><creator>Westerhoff, Paul</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-5607-2885</orcidid><orcidid>https://orcid.org/0000-0002-9241-8759</orcidid><orcidid>https://orcid.org/0000-0002-9924-8713</orcidid><orcidid>https://orcid.org/0000-0002-8557-9992</orcidid></search><sort><creationdate>20241119</creationdate><title>Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater</title><author>Mahmood, Akhlak U. ; Islam, Minhazul ; Gulyuk, Alexey V. ; Briese, Emily ; Velasco, Carmen A. ; Malu, Mohit ; Sharma, Naushita ; Spanias, Andreas ; Yingling, Yaroslava G. ; Westerhoff, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a314t-8443393010103b3cc38a14e74c0453526d8f0e7f90e1f84e8c98d2ecd09f1dd83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Data Science</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mahmood, Akhlak U.</creatorcontrib><creatorcontrib>Islam, Minhazul</creatorcontrib><creatorcontrib>Gulyuk, Alexey V.</creatorcontrib><creatorcontrib>Briese, Emily</creatorcontrib><creatorcontrib>Velasco, Carmen A.</creatorcontrib><creatorcontrib>Malu, Mohit</creatorcontrib><creatorcontrib>Sharma, Naushita</creatorcontrib><creatorcontrib>Spanias, Andreas</creatorcontrib><creatorcontrib>Yingling, Yaroslava G.</creatorcontrib><creatorcontrib>Westerhoff, Paul</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Environmental science &amp; technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mahmood, Akhlak U.</au><au>Islam, Minhazul</au><au>Gulyuk, Alexey V.</au><au>Briese, Emily</au><au>Velasco, Carmen A.</au><au>Malu, Mohit</au><au>Sharma, Naushita</au><au>Spanias, Andreas</au><au>Yingling, Yaroslava G.</au><au>Westerhoff, Paul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater</atitle><jtitle>Environmental science &amp; technology</jtitle><addtitle>Environ. Sci. Technol</addtitle><date>2024-11-19</date><risdate>2024</risdate><volume>58</volume><issue>46</issue><spage>20513</spage><epage>20524</epage><pages>20513-20524</pages><issn>0013-936X</issn><issn>1520-5851</issn><eissn>1520-5851</eissn><abstract>Accurately assessing and managing risks associated with inorganic pollutants in groundwater is imperative. Historic water quality databases are often sparse due to rationale or financial budgets for sample collection and analysis, posing challenges in evaluating exposure or water treatment effectiveness. We utilized and compared two advanced multiple data imputation techniques, AMELIA and MICE algorithms, to fill gaps in sparse groundwater quality data sets. AMELIA outperformed MICE in handling missing values, as MICE tended to overestimate certain values, resulting in more outliers. Field data sets revealed that 75% to 80% of samples exhibited no co-occurring regulated pollutants surpassing MCL values, whereas imputed values showed only 15% to 55% of the samples posed no health risks. Imputed data unveiled a significant increase, ranging from 2 to 5 times, in the number of sampling locations predicted to potentially exceed health-based limits and identified samples where 2 to 6 co-occurring chemicals may occur and surpass health-based levels. Linking imputed data to sampling locations can pinpoint potential hotspots of elevated chemical levels and guide optimal resource allocation for additional field sampling and chemical analysis. With this approach, further analysis of complete data sets allows state agencies authorized to conduct groundwater monitoring, often with limited financial resources, to prioritize sampling locations and chemicals to be tested. Given existing data and time constraints, it is crucial to identify the most strategic use of the available resources to address data gaps effectively. This work establishes a framework to enhance the beneficial impact of funding groundwater data collection by reducing uncertainty in prioritizing future sampling locations and chemical analyses.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>39509340</pmid><doi>10.1021/acs.est.4c05203</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5607-2885</orcidid><orcidid>https://orcid.org/0000-0002-9241-8759</orcidid><orcidid>https://orcid.org/0000-0002-9924-8713</orcidid><orcidid>https://orcid.org/0000-0002-8557-9992</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0013-936X
ispartof Environmental science & technology, 2024-11, Vol.58 (46), p.20513-20524
issn 0013-936X
1520-5851
1520-5851
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11580165
source American Chemical Society:Jisc Collections:American Chemical Society Read & Publish Agreement 2022-2024 (Reading list)
subjects Data Science
title Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T14%3A36%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multiple%20Data%20Imputation%20Methods%20Advance%20Risk%20Analysis%20and%20Treatability%20of%20Co-occurring%20Inorganic%20Chemicals%20in%20Groundwater&rft.jtitle=Environmental%20science%20&%20technology&rft.au=Mahmood,%20Akhlak%20U.&rft.date=2024-11-19&rft.volume=58&rft.issue=46&rft.spage=20513&rft.epage=20524&rft.pages=20513-20524&rft.issn=0013-936X&rft.eissn=1520-5851&rft_id=info:doi/10.1021/acs.est.4c05203&rft_dat=%3Cproquest_pubme%3E3128318968%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a314t-8443393010103b3cc38a14e74c0453526d8f0e7f90e1f84e8c98d2ecd09f1dd83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3128318968&rft_id=info:pmid/39509340&rfr_iscdi=true