Loading…
Considerations in the application of machine learning to aqueous geochemistry: Origin of produced waters in the northern U.S. Gulf Coast Basin
Since the advent of modern computing, geochemists have increasingly relied on computers to garner efficiencies in calculations, data analysis, and data presentation. Entirely new fields, such as Monte Carlo-based simulation and geochemical modeling, have developed under this paradigm. With continued...
Saved in:
Published in: | Applied computing and geosciences 2019-12, Vol.3-4, p.100012, Article 100012 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Since the advent of modern computing, geochemists have increasingly relied on computers to garner efficiencies in calculations, data analysis, and data presentation. Entirely new fields, such as Monte Carlo-based simulation and geochemical modeling, have developed under this paradigm. With continued growth in computing power, machine learning has become an increasingly popular tool in aqueous geochemistry. However, continued reliance on algorithms to perform mathematical calculations can lead to paths of not understanding how to properly prepare information for models or not the reasons behind apparent patterns in the output. Machine learning algorithms can be heavily impacted by what variables are chosen for the model and how data are pre-processed, including handling of missing and censored values (e.g., above or below a detection limit). We propose an approach of parsimonious variable selection, based partially on the signal-to-noise ratio, and suggest and discuss strategies for handling missing and censored data. An example of unsupervised machine learning, using emergent self-organizing map analysis, is applied to water from oil and gas wells in the northern U.S. Gulf Coast Basin, whose composition is controlled by different processes and is derived from various origins. Findings from this investigation suggest five groups of water samples are present, two of which were not identified using conventional data analysis methods. One notable result is that brines derived from seawater evaporation, presumably waters from which the Jurassic Louann salt precipitated, have migrated upward into shallower reservoirs across the study area. This work demonstrates that focus on understanding data quality and exercises to better interpret the output from numerical models continue to be critical skills to further take advantage of applying machine learning to geochemistry.
[Display omitted]
•Machine learning algorithms impacted by variable selection and preprocessing.•Unsupervised machine learning useful for finding structure in data.•Emergent self-organizing map (ESOM) used to cluster produced water data.•ESOM analysis identifies 2 new previously unidentified groups.•Findings suggest evidence of regional upward fluid flow of Jurassic seawater. |
---|---|
ISSN: | 2590-1974 2590-1974 |
DOI: | 10.1016/j.acags.2019.100012 |