Loading…
Representation of molecular structures with persistent homology for machine learning applications in chemistry
Machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of...
Saved in:
Published in: | Nature communications 2020-06, Vol.11 (1), p.3230-9, Article 3230 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of molecular structures to a machine-readable format known as a molecular representation. The choice of such representations impacts the performance and outcomes of chemical machine learning methods. Herein, we present a new concise molecular representation derived from persistent homology, an applied branch of mathematics. We have demonstrated its applicability in a high-throughput computational screening of a large molecular database (GDB-9) with more than 133,000 organic molecules. Our target is to identify novel molecules that selectively interact with CO
2
. The methodology and performance of the novel molecular fingerprinting method is presented and the new chemically-driven persistence image representation is used to screen the GDB-9 database to suggest molecules and/or functional groups with enhanced properties.
The choice of molecular representations can severely impact the performances of machine-learning methods. Here the authors demonstrate a persistence homology based molecular representation through an active-learning approach for predicting CO
2
/N
2
interaction energies at the density functional theory (DFT) level. |
---|---|
ISSN: | 2041-1723 2041-1723 |
DOI: | 10.1038/s41467-020-17035-5 |