Loading…

Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices

A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of Chemical Information and Modeling 2009-04, Vol.49 (4), p.788-799
Main Authors:	Albaugh, Daniel R, Hall, L. Mark, Hill, Dennis W, Kertesz, Tzipporah M, Parham, Marc, Hall, Lowell H, Grant, David F
Format:	Article
Language:	English
Subjects:	Algorithms Artificial Intelligence Chemical compounds Chemical Information Chromatography Chromatography, High Pressure Liquid - statistics & numerical data Cluster Analysis Databases, Factual Forecasting Linear Models Models, Chemical Molecular structure Neural networks Neural Networks (Computer) Quantitative Structure-Activity Relationship Regression analysis Reproducibility of Results Subject Headings
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r2 = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v2 = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r2 = 0.93, MAE = 30 RI units and validation v2 = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.
ISSN:	1549-9596 1520-5142 1549-960X
DOI:	10.1021/ci9000162