Loading…
Drug-likeness scoring based on unsupervised learning
Drug-likeness prediction is important for the virtual screening of drug candidates. It is challenging because the drug-likeness is presumably associated with the whole set of necessary properties to pass through clinical trials, and thus no definite data for regression is available. Recently, binary...
Saved in:
Published in: | Chemical science (Cambridge) 2022-01, Vol.13 (2), p.554-565 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c494t-a9765ce3f694ca4c552410c6f36b478f9b7376c4ea1dc39223028ee53c754d3a3 |
---|---|
cites | cdi_FETCH-LOGICAL-c494t-a9765ce3f694ca4c552410c6f36b478f9b7376c4ea1dc39223028ee53c754d3a3 |
container_end_page | 565 |
container_issue | 2 |
container_start_page | 554 |
container_title | Chemical science (Cambridge) |
container_volume | 13 |
creator | Lee, Kyunghoon Jang, Jinho Seo, Seonghwan Lim, Jaechang Kim, Woo Youn |
description | Drug-likeness prediction is important for the virtual screening of drug candidates. It is challenging because the drug-likeness is presumably associated with the whole set of necessary properties to pass through clinical trials, and thus no definite data for regression is available. Recently, binary classification models based on graph neural networks have been proposed but with strong dependency of their performances on the choice of the negative set for training. Here we propose a novel unsupervised learning model that requires only known drugs for training. We adopted a language model based on a recurrent neural network for unsupervised learning. It showed relatively consistent performance across different datasets, unlike such classification models. In addition, the unsupervised learning model provides drug-likeness scores that well separate distributions with increasing mean values in the order of datasets composed of molecules at a later step in a drug development process, whereas the classification model predicted a polarized distribution with two extreme values for all datasets presumably due to the overconfident prediction for unseen data. Thus, this new concept offers a pragmatic tool for drug-likeness scoring and further can be applied to other biochemical applications.
A new quantification method of drug-likeness based on unsupervised learning. The method only uses drug molecules as training set without any non-drug-like molecules. |
doi_str_mv | 10.1039/d1sc05248a |
format | article |
fullrecord | <record><control><sourceid>proquest_rsc_p</sourceid><recordid>TN_cdi_rsc_primary_d1sc05248a</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2626228883</sourcerecordid><originalsourceid>FETCH-LOGICAL-c494t-a9765ce3f694ca4c552410c6f36b478f9b7376c4ea1dc39223028ee53c754d3a3</originalsourceid><addsrcrecordid>eNpdkclLAzEYxYMottRevCsDXkQYzTZZLkJp3aDgQT2HTCZTp04zNekU_O9NbR2X5JAvvB-PlzwAjhG8RJDIqwIFAzNMhd4DfQwpSllG5H43Y9gDwxDmMC5CUIb5IeiRDGEmBe8DOvHtLK2rN-tsCEkwja_cLMl1sEXSuKR1oV1av64299pq76J8BA5KXQc73J0D8HJ78zy-T6ePdw_j0TQ1VNJVqiVnmbGkZJIaTU0WYyJoWElYTrkoZc4JZ4ZajQpDJMYEYmFtRgzPaEE0GYDrre-yzRe2MNatvK7V0lcL7T9Uoyv1V3HVq5o1ayU4lgKiaHC-M_DNe2vDSi2qYGxda2ebNijM4sZCCBLRs3_ovGm9i8-LFGIScY431MWWMr4JwduyC4Og2vShJuhp_NXHKMKnv-N36PfvR-BkC_hgOvWnUPIJUWyOnA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2616917723</pqid></control><display><type>article</type><title>Drug-likeness scoring based on unsupervised learning</title><source>Open Access: PubMed Central</source><creator>Lee, Kyunghoon ; Jang, Jinho ; Seo, Seonghwan ; Lim, Jaechang ; Kim, Woo Youn</creator><creatorcontrib>Lee, Kyunghoon ; Jang, Jinho ; Seo, Seonghwan ; Lim, Jaechang ; Kim, Woo Youn</creatorcontrib><description>Drug-likeness prediction is important for the virtual screening of drug candidates. It is challenging because the drug-likeness is presumably associated with the whole set of necessary properties to pass through clinical trials, and thus no definite data for regression is available. Recently, binary classification models based on graph neural networks have been proposed but with strong dependency of their performances on the choice of the negative set for training. Here we propose a novel unsupervised learning model that requires only known drugs for training. We adopted a language model based on a recurrent neural network for unsupervised learning. It showed relatively consistent performance across different datasets, unlike such classification models. In addition, the unsupervised learning model provides drug-likeness scores that well separate distributions with increasing mean values in the order of datasets composed of molecules at a later step in a drug development process, whereas the classification model predicted a polarized distribution with two extreme values for all datasets presumably due to the overconfident prediction for unseen data. Thus, this new concept offers a pragmatic tool for drug-likeness scoring and further can be applied to other biochemical applications.
A new quantification method of drug-likeness based on unsupervised learning. The method only uses drug molecules as training set without any non-drug-like molecules.</description><identifier>ISSN: 2041-6520</identifier><identifier>EISSN: 2041-6539</identifier><identifier>DOI: 10.1039/d1sc05248a</identifier><identifier>PMID: 35126987</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Chemistry ; Classification ; Datasets ; Extreme values ; Neural networks ; Recurrent neural networks ; Training ; Unsupervised learning</subject><ispartof>Chemical science (Cambridge), 2022-01, Vol.13 (2), p.554-565</ispartof><rights>This journal is © The Royal Society of Chemistry.</rights><rights>Copyright Royal Society of Chemistry 2022</rights><rights>This journal is © The Royal Society of Chemistry 2022 The Royal Society of Chemistry</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c494t-a9765ce3f694ca4c552410c6f36b478f9b7376c4ea1dc39223028ee53c754d3a3</citedby><cites>FETCH-LOGICAL-c494t-a9765ce3f694ca4c552410c6f36b478f9b7376c4ea1dc39223028ee53c754d3a3</cites><orcidid>0000-0001-7342-4283 ; 0000-0002-4090-7825 ; 0000-0001-7152-2111</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8729801/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8729801/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35126987$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Kyunghoon</creatorcontrib><creatorcontrib>Jang, Jinho</creatorcontrib><creatorcontrib>Seo, Seonghwan</creatorcontrib><creatorcontrib>Lim, Jaechang</creatorcontrib><creatorcontrib>Kim, Woo Youn</creatorcontrib><title>Drug-likeness scoring based on unsupervised learning</title><title>Chemical science (Cambridge)</title><addtitle>Chem Sci</addtitle><description>Drug-likeness prediction is important for the virtual screening of drug candidates. It is challenging because the drug-likeness is presumably associated with the whole set of necessary properties to pass through clinical trials, and thus no definite data for regression is available. Recently, binary classification models based on graph neural networks have been proposed but with strong dependency of their performances on the choice of the negative set for training. Here we propose a novel unsupervised learning model that requires only known drugs for training. We adopted a language model based on a recurrent neural network for unsupervised learning. It showed relatively consistent performance across different datasets, unlike such classification models. In addition, the unsupervised learning model provides drug-likeness scores that well separate distributions with increasing mean values in the order of datasets composed of molecules at a later step in a drug development process, whereas the classification model predicted a polarized distribution with two extreme values for all datasets presumably due to the overconfident prediction for unseen data. Thus, this new concept offers a pragmatic tool for drug-likeness scoring and further can be applied to other biochemical applications.
A new quantification method of drug-likeness based on unsupervised learning. The method only uses drug molecules as training set without any non-drug-like molecules.</description><subject>Chemistry</subject><subject>Classification</subject><subject>Datasets</subject><subject>Extreme values</subject><subject>Neural networks</subject><subject>Recurrent neural networks</subject><subject>Training</subject><subject>Unsupervised learning</subject><issn>2041-6520</issn><issn>2041-6539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpdkclLAzEYxYMottRevCsDXkQYzTZZLkJp3aDgQT2HTCZTp04zNekU_O9NbR2X5JAvvB-PlzwAjhG8RJDIqwIFAzNMhd4DfQwpSllG5H43Y9gDwxDmMC5CUIb5IeiRDGEmBe8DOvHtLK2rN-tsCEkwja_cLMl1sEXSuKR1oV1av64299pq76J8BA5KXQc73J0D8HJ78zy-T6ePdw_j0TQ1VNJVqiVnmbGkZJIaTU0WYyJoWElYTrkoZc4JZ4ZajQpDJMYEYmFtRgzPaEE0GYDrre-yzRe2MNatvK7V0lcL7T9Uoyv1V3HVq5o1ayU4lgKiaHC-M_DNe2vDSi2qYGxda2ebNijM4sZCCBLRs3_ovGm9i8-LFGIScY431MWWMr4JwduyC4Og2vShJuhp_NXHKMKnv-N36PfvR-BkC_hgOvWnUPIJUWyOnA</recordid><startdate>20220105</startdate><enddate>20220105</enddate><creator>Lee, Kyunghoon</creator><creator>Jang, Jinho</creator><creator>Seo, Seonghwan</creator><creator>Lim, Jaechang</creator><creator>Kim, Woo Youn</creator><general>Royal Society of Chemistry</general><general>The Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-7342-4283</orcidid><orcidid>https://orcid.org/0000-0002-4090-7825</orcidid><orcidid>https://orcid.org/0000-0001-7152-2111</orcidid></search><sort><creationdate>20220105</creationdate><title>Drug-likeness scoring based on unsupervised learning</title><author>Lee, Kyunghoon ; Jang, Jinho ; Seo, Seonghwan ; Lim, Jaechang ; Kim, Woo Youn</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c494t-a9765ce3f694ca4c552410c6f36b478f9b7376c4ea1dc39223028ee53c754d3a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Chemistry</topic><topic>Classification</topic><topic>Datasets</topic><topic>Extreme values</topic><topic>Neural networks</topic><topic>Recurrent neural networks</topic><topic>Training</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Kyunghoon</creatorcontrib><creatorcontrib>Jang, Jinho</creatorcontrib><creatorcontrib>Seo, Seonghwan</creatorcontrib><creatorcontrib>Lim, Jaechang</creatorcontrib><creatorcontrib>Kim, Woo Youn</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Chemical science (Cambridge)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Kyunghoon</au><au>Jang, Jinho</au><au>Seo, Seonghwan</au><au>Lim, Jaechang</au><au>Kim, Woo Youn</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Drug-likeness scoring based on unsupervised learning</atitle><jtitle>Chemical science (Cambridge)</jtitle><addtitle>Chem Sci</addtitle><date>2022-01-05</date><risdate>2022</risdate><volume>13</volume><issue>2</issue><spage>554</spage><epage>565</epage><pages>554-565</pages><issn>2041-6520</issn><eissn>2041-6539</eissn><abstract>Drug-likeness prediction is important for the virtual screening of drug candidates. It is challenging because the drug-likeness is presumably associated with the whole set of necessary properties to pass through clinical trials, and thus no definite data for regression is available. Recently, binary classification models based on graph neural networks have been proposed but with strong dependency of their performances on the choice of the negative set for training. Here we propose a novel unsupervised learning model that requires only known drugs for training. We adopted a language model based on a recurrent neural network for unsupervised learning. It showed relatively consistent performance across different datasets, unlike such classification models. In addition, the unsupervised learning model provides drug-likeness scores that well separate distributions with increasing mean values in the order of datasets composed of molecules at a later step in a drug development process, whereas the classification model predicted a polarized distribution with two extreme values for all datasets presumably due to the overconfident prediction for unseen data. Thus, this new concept offers a pragmatic tool for drug-likeness scoring and further can be applied to other biochemical applications.
A new quantification method of drug-likeness based on unsupervised learning. The method only uses drug molecules as training set without any non-drug-like molecules.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>35126987</pmid><doi>10.1039/d1sc05248a</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-7342-4283</orcidid><orcidid>https://orcid.org/0000-0002-4090-7825</orcidid><orcidid>https://orcid.org/0000-0001-7152-2111</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2041-6520 |
ispartof | Chemical science (Cambridge), 2022-01, Vol.13 (2), p.554-565 |
issn | 2041-6520 2041-6539 |
language | eng |
recordid | cdi_rsc_primary_d1sc05248a |
source | Open Access: PubMed Central |
subjects | Chemistry Classification Datasets Extreme values Neural networks Recurrent neural networks Training Unsupervised learning |
title | Drug-likeness scoring based on unsupervised learning |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T05%3A07%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_rsc_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Drug-likeness%20scoring%20based%20on%20unsupervised%20learning&rft.jtitle=Chemical%20science%20(Cambridge)&rft.au=Lee,%20Kyunghoon&rft.date=2022-01-05&rft.volume=13&rft.issue=2&rft.spage=554&rft.epage=565&rft.pages=554-565&rft.issn=2041-6520&rft.eissn=2041-6539&rft_id=info:doi/10.1039/d1sc05248a&rft_dat=%3Cproquest_rsc_p%3E2626228883%3C/proquest_rsc_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c494t-a9765ce3f694ca4c552410c6f36b478f9b7376c4ea1dc39223028ee53c754d3a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2616917723&rft_id=info:pmid/35126987&rfr_iscdi=true |