Loading…

Representation of molecular structures with persistent homology for machine learning applications in chemistry

Machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of...

Full description

Saved in:
Bibliographic Details
Published in:Nature communications 2020-06, Vol.11 (1), p.3230-9, Article 3230
Main Authors: Townsend, Jacob, Micucci, Cassie Putman, Hymel, John H., Maroulas, Vasileios, Vogiatzis, Konstantinos D.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of molecular structures to a machine-readable format known as a molecular representation. The choice of such representations impacts the performance and outcomes of chemical machine learning methods. Herein, we present a new concise molecular representation derived from persistent homology, an applied branch of mathematics. We have demonstrated its applicability in a high-throughput computational screening of a large molecular database (GDB-9) with more than 133,000 organic molecules. Our target is to identify novel molecules that selectively interact with CO 2 . The methodology and performance of the novel molecular fingerprinting method is presented and the new chemically-driven persistence image representation is used to screen the GDB-9 database to suggest molecules and/or functional groups with enhanced properties. The choice of molecular representations can severely impact the performances of machine-learning methods. Here the authors demonstrate a persistence homology based molecular representation through an active-learning approach for predicting CO 2 /N 2 interaction energies at the density functional theory (DFT) level.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-020-17035-5