Loading…

Predicting crystallisation propensity of small molecules

Abstract only We show that suitably chosen machine learning algorithms can be used to predict the "crystallisation propensity" of classes of molecules with a promisingly low error rate, using the Cambridge Structural Database and ZINC database to provide training examples of crystalline an...

Full description

Saved in:
Bibliographic Details
Published in:Acta crystallographica. Section A, Foundations and advances Foundations and advances, 2014-08, Vol.70 (a1), p.C1628-C1628
Main Authors: Wicker, Jerome, Cooper, Richard, David, William
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract only We show that suitably chosen machine learning algorithms can be used to predict the "crystallisation propensity" of classes of molecules with a promisingly low error rate, using the Cambridge Structural Database and ZINC database to provide training examples of crystalline and non-crystalline molecules. Supervised learning tasks involve using machine learning algorithms to infer a function from known training data which allows classification of unknown test data. Such algorithms have been successfully used to predict continuous properties of compounds, such as melting point[1] and solubility[2]. Similar methods have also been applied to protein crystallinity predictions based on amino acid sequences[3], but little has previously been done to attempt to classify small organic molecules as crystalline or non-crystalline due to the difficulty in finding descriptors appropriate to the problem. Our approach uses only information about the atomic types and connectivity, leaving aside the confounding effects of solvents and crystallisation conditions. The result is reinforced by a blind microcrystallisation screening of a sample of materials, which confirmed the classification accuracy of the predictive model. An analysis of the most significant descriptors used in the classification is also presented, and we show that significant predictive accuracy can be obtained using relatively few descriptors.
ISSN:2053-2733
2053-2733
DOI:10.1107/S2053273314083715