Loading…
AI for Chemical Space Gap Filling and Novel Compound Generation
When considering large sets of molecules, it is helpful to place them in the context of a "chemical space" - a multidimensional space defined by a set of descriptors that can be used to visualize and analyze compound grouping as well as identify regions that might be void of valid structur...
Saved in:
Published in: | arXiv.org 2022-01 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | When considering large sets of molecules, it is helpful to place them in the context of a "chemical space" - a multidimensional space defined by a set of descriptors that can be used to visualize and analyze compound grouping as well as identify regions that might be void of valid structures. The chemical space of all possible molecules in a given biological or environmental sample can be vast and largely unexplored, mainly due to current limitations in processing of 'big data' by brute force methods (e.g., enumeration of all possible compounds in a space). Recent advances in artificial intelligence (AI) have led to multiple new cheminformatics tools that incorporate AI techniques to characterize and learn the structure and properties of molecules in order to generate plausible compounds, thereby contributing to more accessible and explorable regions of chemical space without the need for brute force methods. We have used one such tool, a deep-learning software called DarkChem, which learns a representation of the molecular structure of compounds by compressing them into a latent space. With DarkChem's design, distance in this latent space is often associated with compound similarity, making sparse regions interesting targets for compound generation due to the possibility of generating novel compounds. In this study, we used 1 million small molecules (less than 1000 Da) to create a representative chemical space (defined by calculated molecular properties) of all small molecules. We identified regions with few or no compounds and investigated their location in DarkChem's latent space. From these spaces, we generated 694,645 valid molecules, all of which represent molecules not found in any chemical database to date. These molecules filled 50.8% of the probed empty spaces in molecular property space. Generated molecules are provided in the supporting information. |
---|---|
ISSN: | 2331-8422 |