Loading…
Explainable Molecular Sets: Using Information Theory to Generate Meaningful Descriptions of Groups of Molecules
Algorithmically identifying the meaningful similarities between an assortment of molecules is a critical chemical problem, and one which is only gaining in relevance as data-driven chemistry continues to progress. Effectively addressing this challenge can be achieved through a reformulation of the p...
Saved in:
Published in: | Journal of chemical information and modeling 2021-10, Vol.61 (10), p.4877-4889 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Algorithmically identifying the meaningful similarities between an assortment of molecules is a critical chemical problem, and one which is only gaining in relevance as data-driven chemistry continues to progress. Effectively addressing this challenge can be achieved through a reformulation of the problem into information theory, cluster-based supervised classification, and the implementation of key concepts, particularly information entropy and mutual information. These concepts are combined with unsupervised learning atop learned chemical spaces to generate meaningful labels for arbitrary collections of molecules. An open-source and highly extensible codebase is provided to undertake these experiments, demonstrate the viability of the approach on known clusters, and glean insights into the learned representations of chemical space within message-passing neural networks, an architecture not readily permitting interpretability. This approach facilitates the interoperability between human chemical knowledge and the algorithmically derived insights, which will continue to become more prevalent in the coming years. |
---|---|
ISSN: | 1549-9596 1549-960X |
DOI: | 10.1021/acs.jcim.1c00519 |