Loading…

Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices

A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model...

Full description

Saved in:
Bibliographic Details
Published in:Journal of Chemical Information and Modeling 2009-04, Vol.49 (4), p.788-799
Main Authors: Albaugh, Daniel R, Hall, L. Mark, Hill, Dennis W, Kertesz, Tzipporah M, Parham, Marc, Hall, Lowell H, Grant, David F
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a340t-5a2c6d55141b3107073b955e3b284f5d2a3321c8fef2587ea5393cc5399f121d3
cites cdi_FETCH-LOGICAL-a340t-5a2c6d55141b3107073b955e3b284f5d2a3321c8fef2587ea5393cc5399f121d3
container_end_page 799
container_issue 4
container_start_page 788
container_title Journal of Chemical Information and Modeling
container_volume 49
creator Albaugh, Daniel R
Hall, L. Mark
Hill, Dennis W
Kertesz, Tzipporah M
Parham, Marc
Hall, Lowell H
Grant, David F
description A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r2 = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v2 = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r2 = 0.93, MAE = 30 RI units and validation v2 = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.
doi_str_mv 10.1021/ci9000162
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67174188</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1697477491</sourcerecordid><originalsourceid>FETCH-LOGICAL-a340t-5a2c6d55141b3107073b955e3b284f5d2a3321c8fef2587ea5393cc5399f121d3</originalsourceid><addsrcrecordid>eNplkF1LwzAUhoMobk4v_AMSBAUvqvlo2uZyjLkNhg51t5Y0TSSza2aSov57uw8c6M15D4fnvOfwAnCO0S1GBN9JwxFCOCEHoIsZQRHDMTlc9zGPOONJB5x4v0CIUp6QY9DBnCKO06QLXmdOlUYGY2toNRzPpgP4pIKqN5NJXaovOPemfoN9F4w20ogKPqjGbSR8WvfuoahLOBk526zgMHoOIqj1ppHKn4IjLSqvznbaA_P74ctgHE0fR5NBfxoJGqMQMUFkUrL2bVxQjFKU0oIzpmhBslizkghKCZaZVpqwLFWCUU6lbCvXmOCS9sD11nfl7EejfMiXxktVVaJWtvF5kuI0xlnWgpd_wIVtXN3-lpM2QELjDXSzhaSz3jul85UzS-G-c4zydeL5b-Ite7EzbIqlKvfkLuIWuNoCQvr9sf9GP420hEc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>216223488</pqid></control><display><type>article</type><title>Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices</title><source>American Chemical Society:Jisc Collections:American Chemical Society Read &amp; Publish Agreement 2022-2024 (Reading list)</source><creator>Albaugh, Daniel R ; Hall, L. Mark ; Hill, Dennis W ; Kertesz, Tzipporah M ; Parham, Marc ; Hall, Lowell H ; Grant, David F</creator><creatorcontrib>Albaugh, Daniel R ; Hall, L. Mark ; Hill, Dennis W ; Kertesz, Tzipporah M ; Parham, Marc ; Hall, Lowell H ; Grant, David F</creatorcontrib><description>A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r2 = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v2 = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r2 = 0.93, MAE = 30 RI units and validation v2 = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1520-5142</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/ci9000162</identifier><identifier>PMID: 19309176</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Algorithms ; Artificial Intelligence ; Chemical compounds ; Chemical Information ; Chromatography ; Chromatography, High Pressure Liquid - statistics &amp; numerical data ; Cluster Analysis ; Databases, Factual ; Forecasting ; Linear Models ; Models, Chemical ; Molecular structure ; Neural networks ; Neural Networks (Computer) ; Quantitative Structure-Activity Relationship ; Regression analysis ; Reproducibility of Results ; Subject Headings</subject><ispartof>Journal of Chemical Information and Modeling, 2009-04, Vol.49 (4), p.788-799</ispartof><rights>Copyright © 2009 American Chemical Society</rights><rights>Copyright American Chemical Society Apr 27, 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a340t-5a2c6d55141b3107073b955e3b284f5d2a3321c8fef2587ea5393cc5399f121d3</citedby><cites>FETCH-LOGICAL-a340t-5a2c6d55141b3107073b955e3b284f5d2a3321c8fef2587ea5393cc5399f121d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27911,27912</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/19309176$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Albaugh, Daniel R</creatorcontrib><creatorcontrib>Hall, L. Mark</creatorcontrib><creatorcontrib>Hill, Dennis W</creatorcontrib><creatorcontrib>Kertesz, Tzipporah M</creatorcontrib><creatorcontrib>Parham, Marc</creatorcontrib><creatorcontrib>Hall, Lowell H</creatorcontrib><creatorcontrib>Grant, David F</creatorcontrib><title>Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices</title><title>Journal of Chemical Information and Modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r2 = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v2 = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r2 = 0.93, MAE = 30 RI units and validation v2 = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Chemical compounds</subject><subject>Chemical Information</subject><subject>Chromatography</subject><subject>Chromatography, High Pressure Liquid - statistics &amp; numerical data</subject><subject>Cluster Analysis</subject><subject>Databases, Factual</subject><subject>Forecasting</subject><subject>Linear Models</subject><subject>Models, Chemical</subject><subject>Molecular structure</subject><subject>Neural networks</subject><subject>Neural Networks (Computer)</subject><subject>Quantitative Structure-Activity Relationship</subject><subject>Regression analysis</subject><subject>Reproducibility of Results</subject><subject>Subject Headings</subject><issn>1549-9596</issn><issn>1520-5142</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNplkF1LwzAUhoMobk4v_AMSBAUvqvlo2uZyjLkNhg51t5Y0TSSza2aSov57uw8c6M15D4fnvOfwAnCO0S1GBN9JwxFCOCEHoIsZQRHDMTlc9zGPOONJB5x4v0CIUp6QY9DBnCKO06QLXmdOlUYGY2toNRzPpgP4pIKqN5NJXaovOPemfoN9F4w20ogKPqjGbSR8WvfuoahLOBk526zgMHoOIqj1ppHKn4IjLSqvznbaA_P74ctgHE0fR5NBfxoJGqMQMUFkUrL2bVxQjFKU0oIzpmhBslizkghKCZaZVpqwLFWCUU6lbCvXmOCS9sD11nfl7EejfMiXxktVVaJWtvF5kuI0xlnWgpd_wIVtXN3-lpM2QELjDXSzhaSz3jul85UzS-G-c4zydeL5b-Ite7EzbIqlKvfkLuIWuNoCQvr9sf9GP420hEc</recordid><startdate>20090427</startdate><enddate>20090427</enddate><creator>Albaugh, Daniel R</creator><creator>Hall, L. Mark</creator><creator>Hill, Dennis W</creator><creator>Kertesz, Tzipporah M</creator><creator>Parham, Marc</creator><creator>Hall, Lowell H</creator><creator>Grant, David F</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20090427</creationdate><title>Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices</title><author>Albaugh, Daniel R ; Hall, L. Mark ; Hill, Dennis W ; Kertesz, Tzipporah M ; Parham, Marc ; Hall, Lowell H ; Grant, David F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a340t-5a2c6d55141b3107073b955e3b284f5d2a3321c8fef2587ea5393cc5399f121d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Chemical compounds</topic><topic>Chemical Information</topic><topic>Chromatography</topic><topic>Chromatography, High Pressure Liquid - statistics &amp; numerical data</topic><topic>Cluster Analysis</topic><topic>Databases, Factual</topic><topic>Forecasting</topic><topic>Linear Models</topic><topic>Models, Chemical</topic><topic>Molecular structure</topic><topic>Neural networks</topic><topic>Neural Networks (Computer)</topic><topic>Quantitative Structure-Activity Relationship</topic><topic>Regression analysis</topic><topic>Reproducibility of Results</topic><topic>Subject Headings</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Albaugh, Daniel R</creatorcontrib><creatorcontrib>Hall, L. Mark</creatorcontrib><creatorcontrib>Hill, Dennis W</creatorcontrib><creatorcontrib>Kertesz, Tzipporah M</creatorcontrib><creatorcontrib>Parham, Marc</creatorcontrib><creatorcontrib>Hall, Lowell H</creatorcontrib><creatorcontrib>Grant, David F</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of Chemical Information and Modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Albaugh, Daniel R</au><au>Hall, L. Mark</au><au>Hill, Dennis W</au><au>Kertesz, Tzipporah M</au><au>Parham, Marc</au><au>Hall, Lowell H</au><au>Grant, David F</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices</atitle><jtitle>Journal of Chemical Information and Modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2009-04-27</date><risdate>2009</risdate><volume>49</volume><issue>4</issue><spage>788</spage><epage>799</epage><pages>788-799</pages><issn>1549-9596</issn><eissn>1520-5142</eissn><eissn>1549-960X</eissn><abstract>A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r2 = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v2 = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r2 = 0.93, MAE = 30 RI units and validation v2 = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>19309176</pmid><doi>10.1021/ci9000162</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of Chemical Information and Modeling, 2009-04, Vol.49 (4), p.788-799
issn 1549-9596
1520-5142
1549-960X
language eng
recordid cdi_proquest_miscellaneous_67174188
source American Chemical Society:Jisc Collections:American Chemical Society Read & Publish Agreement 2022-2024 (Reading list)
subjects Algorithms
Artificial Intelligence
Chemical compounds
Chemical Information
Chromatography
Chromatography, High Pressure Liquid - statistics & numerical data
Cluster Analysis
Databases, Factual
Forecasting
Linear Models
Models, Chemical
Molecular structure
Neural networks
Neural Networks (Computer)
Quantitative Structure-Activity Relationship
Regression analysis
Reproducibility of Results
Subject Headings
title Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T23%3A36%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prediction%20of%20HPLC%20Retention%20Index%20Using%20Artificial%20Neural%20Networks%20and%20IGroup%20E-State%20Indices&rft.jtitle=Journal%20of%20Chemical%20Information%20and%20Modeling&rft.au=Albaugh,%20Daniel%20R&rft.date=2009-04-27&rft.volume=49&rft.issue=4&rft.spage=788&rft.epage=799&rft.pages=788-799&rft.issn=1549-9596&rft.eissn=1520-5142&rft_id=info:doi/10.1021/ci9000162&rft_dat=%3Cproquest_cross%3E1697477491%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a340t-5a2c6d55141b3107073b955e3b284f5d2a3321c8fef2587ea5393cc5399f121d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=216223488&rft_id=info:pmid/19309176&rfr_iscdi=true