Loading…
Modeling Aqueous Solubility
This paper describes the development of an aqueous solubility model based on solubility data from the Syracuse database, calculated octanol−water partition coefficient, and 51 2D molecular descriptors. Two different statistical packages, SIMCA and Cubist, were used and the results were compared. The...
Saved in:
Published in: | Journal of Chemical Information and Computer Sciences 2003-05, Vol.43 (3), p.837-841 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-a349t-69685124048d1a5bf3396be932ddb0bacdc1ad842b2c988dc0693562ed8e09263 |
---|---|
cites | cdi_FETCH-LOGICAL-a349t-69685124048d1a5bf3396be932ddb0bacdc1ad842b2c988dc0693562ed8e09263 |
container_end_page | 841 |
container_issue | 3 |
container_start_page | 837 |
container_title | Journal of Chemical Information and Computer Sciences |
container_volume | 43 |
creator | Butina, Darko Gola, Joelle M. R |
description | This paper describes the development of an aqueous solubility model based on solubility data from the Syracuse database, calculated octanol−water partition coefficient, and 51 2D molecular descriptors. Two different statistical packages, SIMCA and Cubist, were used and the results were compared. The Cubist model, which comprises a collection of rules, each of which has an associated Multiple Linear Regression model (MLR), gave better overall results on a test set of 640 compounds with an overall squared correlation coefficient of 0.74 and an absolute average error of 0.68 log units. Both training and independent test sets had similar distributions of structures in terms of the different functionalities present60% neutral, 14% acidic, 8% phenolic, 11% monobasic, 4% polybasic, and 3% zwitterionic molecules. Sets were designed by random selection, with 2688 (81%) and 640 (19%) molecules, respectively, forming the training and the test sets. |
doi_str_mv | 10.1021/ci020279y |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_73322675</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>73322675</sourcerecordid><originalsourceid>FETCH-LOGICAL-a349t-69685124048d1a5bf3396be932ddb0bacdc1ad842b2c988dc0693562ed8e09263</originalsourceid><addsrcrecordid>eNpt0M9LwzAUB_AgipvTg2dBdlHwUH15adLkOOZvJiqb4C2kSSad3bo1Lbj_3o6OefH0Du_D-_El5JTCNQWkNzYDBEzUeo90KY9VpAR87pMugOIRMiY75CiEGQBjSuAh6VBMREJj2iVnL4Xzebb46g9WtS_q0B8XeZ1meVatj8nB1OTBn2xrj3zc302Gj9Ho9eFpOBhFhsWqioQSklOMIZaOGp5ON1tSrxg6l0JqrLPUOBljilZJ6SwIxbhA76QHhYL1yGU7d1kWzRGh0vMsWJ_nZrG5SCeMIYqEN_CqhbYsQij9VC_LbG7KtaagN0noXRKNPd8OrdO5d39y-3oDohZkofI_u74pv7VIWML15G2s2e3w_RkmIz1u_EXrjQ16VtTlosnkn8W_PgRyDQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>73322675</pqid></control><display><type>article</type><title>Modeling Aqueous Solubility</title><source>American Chemical Society:Jisc Collections:American Chemical Society Read & Publish Agreement 2022-2024 (Reading list)</source><creator>Butina, Darko ; Gola, Joelle M. R</creator><creatorcontrib>Butina, Darko ; Gola, Joelle M. R</creatorcontrib><description>This paper describes the development of an aqueous solubility model based on solubility data from the Syracuse database, calculated octanol−water partition coefficient, and 51 2D molecular descriptors. Two different statistical packages, SIMCA and Cubist, were used and the results were compared. The Cubist model, which comprises a collection of rules, each of which has an associated Multiple Linear Regression model (MLR), gave better overall results on a test set of 640 compounds with an overall squared correlation coefficient of 0.74 and an absolute average error of 0.68 log units. Both training and independent test sets had similar distributions of structures in terms of the different functionalities present60% neutral, 14% acidic, 8% phenolic, 11% monobasic, 4% polybasic, and 3% zwitterionic molecules. Sets were designed by random selection, with 2688 (81%) and 640 (19%) molecules, respectively, forming the training and the test sets.</description><identifier>ISSN: 0095-2338</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/ci020279y</identifier><identifier>PMID: 12767141</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Models, Chemical ; Organic Chemicals - chemistry ; Quantitative Structure-Activity Relationship ; Solubility ; Water - chemistry</subject><ispartof>Journal of Chemical Information and Computer Sciences, 2003-05, Vol.43 (3), p.837-841</ispartof><rights>Copyright © 2003 American Chemical Society</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a349t-69685124048d1a5bf3396be932ddb0bacdc1ad842b2c988dc0693562ed8e09263</citedby><cites>FETCH-LOGICAL-a349t-69685124048d1a5bf3396be932ddb0bacdc1ad842b2c988dc0693562ed8e09263</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/12767141$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Butina, Darko</creatorcontrib><creatorcontrib>Gola, Joelle M. R</creatorcontrib><title>Modeling Aqueous Solubility</title><title>Journal of Chemical Information and Computer Sciences</title><addtitle>J. Chem. Inf. Comput. Sci</addtitle><description>This paper describes the development of an aqueous solubility model based on solubility data from the Syracuse database, calculated octanol−water partition coefficient, and 51 2D molecular descriptors. Two different statistical packages, SIMCA and Cubist, were used and the results were compared. The Cubist model, which comprises a collection of rules, each of which has an associated Multiple Linear Regression model (MLR), gave better overall results on a test set of 640 compounds with an overall squared correlation coefficient of 0.74 and an absolute average error of 0.68 log units. Both training and independent test sets had similar distributions of structures in terms of the different functionalities present60% neutral, 14% acidic, 8% phenolic, 11% monobasic, 4% polybasic, and 3% zwitterionic molecules. Sets were designed by random selection, with 2688 (81%) and 640 (19%) molecules, respectively, forming the training and the test sets.</description><subject>Models, Chemical</subject><subject>Organic Chemicals - chemistry</subject><subject>Quantitative Structure-Activity Relationship</subject><subject>Solubility</subject><subject>Water - chemistry</subject><issn>0095-2338</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><recordid>eNpt0M9LwzAUB_AgipvTg2dBdlHwUH15adLkOOZvJiqb4C2kSSad3bo1Lbj_3o6OefH0Du_D-_El5JTCNQWkNzYDBEzUeo90KY9VpAR87pMugOIRMiY75CiEGQBjSuAh6VBMREJj2iVnL4Xzebb46g9WtS_q0B8XeZ1meVatj8nB1OTBn2xrj3zc302Gj9Ho9eFpOBhFhsWqioQSklOMIZaOGp5ON1tSrxg6l0JqrLPUOBljilZJ6SwIxbhA76QHhYL1yGU7d1kWzRGh0vMsWJ_nZrG5SCeMIYqEN_CqhbYsQij9VC_LbG7KtaagN0noXRKNPd8OrdO5d39y-3oDohZkofI_u74pv7VIWML15G2s2e3w_RkmIz1u_EXrjQ16VtTlosnkn8W_PgRyDQ</recordid><startdate>20030501</startdate><enddate>20030501</enddate><creator>Butina, Darko</creator><creator>Gola, Joelle M. R</creator><general>American Chemical Society</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20030501</creationdate><title>Modeling Aqueous Solubility</title><author>Butina, Darko ; Gola, Joelle M. R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a349t-69685124048d1a5bf3396be932ddb0bacdc1ad842b2c988dc0693562ed8e09263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Models, Chemical</topic><topic>Organic Chemicals - chemistry</topic><topic>Quantitative Structure-Activity Relationship</topic><topic>Solubility</topic><topic>Water - chemistry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Butina, Darko</creatorcontrib><creatorcontrib>Gola, Joelle M. R</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of Chemical Information and Computer Sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Butina, Darko</au><au>Gola, Joelle M. R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modeling Aqueous Solubility</atitle><jtitle>Journal of Chemical Information and Computer Sciences</jtitle><addtitle>J. Chem. Inf. Comput. Sci</addtitle><date>2003-05-01</date><risdate>2003</risdate><volume>43</volume><issue>3</issue><spage>837</spage><epage>841</epage><pages>837-841</pages><issn>0095-2338</issn><eissn>1549-960X</eissn><abstract>This paper describes the development of an aqueous solubility model based on solubility data from the Syracuse database, calculated octanol−water partition coefficient, and 51 2D molecular descriptors. Two different statistical packages, SIMCA and Cubist, were used and the results were compared. The Cubist model, which comprises a collection of rules, each of which has an associated Multiple Linear Regression model (MLR), gave better overall results on a test set of 640 compounds with an overall squared correlation coefficient of 0.74 and an absolute average error of 0.68 log units. Both training and independent test sets had similar distributions of structures in terms of the different functionalities present60% neutral, 14% acidic, 8% phenolic, 11% monobasic, 4% polybasic, and 3% zwitterionic molecules. Sets were designed by random selection, with 2688 (81%) and 640 (19%) molecules, respectively, forming the training and the test sets.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>12767141</pmid><doi>10.1021/ci020279y</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0095-2338 |
ispartof | Journal of Chemical Information and Computer Sciences, 2003-05, Vol.43 (3), p.837-841 |
issn | 0095-2338 1549-960X |
language | eng |
recordid | cdi_proquest_miscellaneous_73322675 |
source | American Chemical Society:Jisc Collections:American Chemical Society Read & Publish Agreement 2022-2024 (Reading list) |
subjects | Models, Chemical Organic Chemicals - chemistry Quantitative Structure-Activity Relationship Solubility Water - chemistry |
title | Modeling Aqueous Solubility |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T09%3A05%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modeling%20Aqueous%20Solubility&rft.jtitle=Journal%20of%20Chemical%20Information%20and%20Computer%20Sciences&rft.au=Butina,%20Darko&rft.date=2003-05-01&rft.volume=43&rft.issue=3&rft.spage=837&rft.epage=841&rft.pages=837-841&rft.issn=0095-2338&rft.eissn=1549-960X&rft_id=info:doi/10.1021/ci020279y&rft_dat=%3Cproquest_cross%3E73322675%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a349t-69685124048d1a5bf3396be932ddb0bacdc1ad842b2c988dc0693562ed8e09263%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=73322675&rft_id=info:pmid/12767141&rfr_iscdi=true |