Loading…

Applications of Autoencoders along with Deep Learning Techniques to generate valid molecules

From the moment of identifying the fundamental cause of an illness to its availability in the marketplace, it takes an average of 10 years and almost $2.6 billion dollars to develop a medication. We’re actually hunting for a needle in a haystack, which takes a lot of time, effort, and money. In a so...

Full description

Saved in:
Bibliographic Details
Published in:Journal of physics. Conference series 2021-11, Vol.2070 (1), p.12125
Main Authors: Sesha Sai Aparna, T, Anuradha, T
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:From the moment of identifying the fundamental cause of an illness to its availability in the marketplace, it takes an average of 10 years and almost $2.6 billion dollars to develop a medication. We’re actually hunting for a needle in a haystack, which takes a lot of time, effort, and money. In a solution space of between 10 30 and 10 100 synthetically viable compounds, we’re seeking for the one molecule that can turn off a disease at the molecular level. The chemical solution space is just too large to adequately screen for the desired molecule. Only a small percentage of the synthetically viable compounds for wet lab research are stored in pharmaceutical chemical repositories. Computational de novo drug design can be used to explore this vast chemical space and develop previously undesigned compounds. Computational drug design can cut the amount of time spent in the discovery phase in half, resulting in a shorter time to market and lower drug prices. Deep learning and artificial intelligence (AI) have opened up new perspectives in cheminformatics, especially in molecules generative models. Recurrent neural networks (RNNs) trained with molecules in the SMILES text format, in particular, are very good at exploring the chemical space. Two baseline models were created for generating molecules, one of the model includes an encoder that takes SMILES as input and then develops a deep generative LSTM model which acts as a hidden layer and the output from layers acts as an input to the decoder. The other baseline model acts the same as the above-mentioned model but it includes latent space, it is simply a representation of compressed data that bring related data points closer together physically. To learn data properties and find simpler data representations for analysis, and weights which are obtained from the previous model to generate more efficient molecules. Then created a custom function to play with the temperature of the softmax activation function which creates a threshold value for the valid molecules to generate. This model enables us to produce new molecules through successful exploration.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/2070/1/012125