Loading…
A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and la...
Saved in:
Published in: | BMC genomics 2010-12, Vol.11 Suppl 5 (S5), p.S9-S9, Article S9 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-b571t-bf428ee0555571b54a461d61fec839e075b8265bca9fae6edfb4b4e671d2dc3f3 |
---|---|
cites | cdi_FETCH-LOGICAL-b571t-bf428ee0555571b54a461d61fec839e075b8265bca9fae6edfb4b4e671d2dc3f3 |
container_end_page | S9 |
container_issue | S5 |
container_start_page | S9 |
container_title | BMC genomics |
container_volume | 11 Suppl 5 |
creator | Costa, Pedro R Acencio, Marcio L Lemke, Ney |
description | The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.
In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.
We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability. |
doi_str_mv | 10.1186/1471-2164-11-S5-S9 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3045802</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>845392654</sourcerecordid><originalsourceid>FETCH-LOGICAL-b571t-bf428ee0555571b54a461d61fec839e075b8265bca9fae6edfb4b4e671d2dc3f3</originalsourceid><addsrcrecordid>eNqFkk9v1DAQxSMEoqXwBTggiwunFI__JbkgVVWBSpU4LJwtO55kXSV2sJOt-u3JasuqRSB8sTXzm6fnpymKt0DPAWr1EUQFJQMlSoByI8tN86w4PRafP3qfFK9yvqUUqprJl8UJAwa0qeRpsbsgo2m3PiAZ0KTgQ0_MNKW4FkkXE-kxxBHLO--QTAmdb2cfA4kdGWOy3hETHHFp6XtjByTbZTRhP4SZWJPRkRXO93nGMZcD7nAgzszmdfGiM0PGNw_3WfHj89X3y6_lzbcv15cXN6WVFcyl7QSrEalcTwVWCiMUOAUdtjVvkFbS1kxJ25qmM6jQdVZYgaoCx1zLO35WfDroTosd0bUY5mQGPSU_mnSvo_H6aSf4re7jTnMqZE3ZKnB1ELA-_kPgaaeNo97nrve5awC9kXrTrDofHoyk-HPBPOvR5xaHwQSMS9aNFIpTXvH_krWQvFk_LVby_R_kbVxSWPPUDWWUS8XVCrED1KaYc8Lu6B2o3q_R392-exzbceT33vBfIwjGKA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>902035636</pqid></control><display><type>article</type><title>A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Costa, Pedro R ; Acencio, Marcio L ; Lemke, Ney</creator><creatorcontrib>Costa, Pedro R ; Acencio, Marcio L ; Lemke, Ney</creatorcontrib><description>The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.
In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.
We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.</description><identifier>ISSN: 1471-2164</identifier><identifier>EISSN: 1471-2164</identifier><identifier>DOI: 10.1186/1471-2164-11-S5-S9</identifier><identifier>PMID: 21210975</identifier><language>eng</language><publisher>England: BioMed Central</publisher><subject>Algorithms ; Artificial Intelligence ; Bioinformatics ; Biology ; Computational Biology - methods ; Data mining ; Decision trees ; Drug Discovery - methods ; E coli ; Genetic Diseases, Inborn - genetics ; Genetics ; Genomes ; Genomics ; Genomics - methods ; Humans ; Morbidity ; Proceedings ; Proteins - genetics ; Proteins - metabolism ; Studies ; Yeast</subject><ispartof>BMC genomics, 2010-12, Vol.11 Suppl 5 (S5), p.S9-S9, Article S9</ispartof><rights>2010 Costa et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>Copyright ©2010 Costa et al; licensee BioMed Central Ltd. 2010 Costa et al; licensee BioMed Central Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-b571t-bf428ee0555571b54a461d61fec839e075b8265bca9fae6edfb4b4e671d2dc3f3</citedby><cites>FETCH-LOGICAL-b571t-bf428ee0555571b54a461d61fec839e075b8265bca9fae6edfb4b4e671d2dc3f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045802/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/902035636?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25731,27901,27902,36989,36990,44566,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21210975$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Costa, Pedro R</creatorcontrib><creatorcontrib>Acencio, Marcio L</creatorcontrib><creatorcontrib>Lemke, Ney</creatorcontrib><title>A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data</title><title>BMC genomics</title><addtitle>BMC Genomics</addtitle><description>The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.
In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.
We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Computational Biology - methods</subject><subject>Data mining</subject><subject>Decision trees</subject><subject>Drug Discovery - methods</subject><subject>E coli</subject><subject>Genetic Diseases, Inborn - genetics</subject><subject>Genetics</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>Morbidity</subject><subject>Proceedings</subject><subject>Proteins - genetics</subject><subject>Proteins - metabolism</subject><subject>Studies</subject><subject>Yeast</subject><issn>1471-2164</issn><issn>1471-2164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqFkk9v1DAQxSMEoqXwBTggiwunFI__JbkgVVWBSpU4LJwtO55kXSV2sJOt-u3JasuqRSB8sTXzm6fnpymKt0DPAWr1EUQFJQMlSoByI8tN86w4PRafP3qfFK9yvqUUqprJl8UJAwa0qeRpsbsgo2m3PiAZ0KTgQ0_MNKW4FkkXE-kxxBHLO--QTAmdb2cfA4kdGWOy3hETHHFp6XtjByTbZTRhP4SZWJPRkRXO93nGMZcD7nAgzszmdfGiM0PGNw_3WfHj89X3y6_lzbcv15cXN6WVFcyl7QSrEalcTwVWCiMUOAUdtjVvkFbS1kxJ25qmM6jQdVZYgaoCx1zLO35WfDroTosd0bUY5mQGPSU_mnSvo_H6aSf4re7jTnMqZE3ZKnB1ELA-_kPgaaeNo97nrve5awC9kXrTrDofHoyk-HPBPOvR5xaHwQSMS9aNFIpTXvH_krWQvFk_LVby_R_kbVxSWPPUDWWUS8XVCrED1KaYc8Lu6B2o3q_R392-exzbceT33vBfIwjGKA</recordid><startdate>20101222</startdate><enddate>20101222</enddate><creator>Costa, Pedro R</creator><creator>Acencio, Marcio L</creator><creator>Lemke, Ney</creator><general>BioMed Central</general><general>BioMed Central Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7U7</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20101222</creationdate><title>A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data</title><author>Costa, Pedro R ; Acencio, Marcio L ; Lemke, Ney</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-b571t-bf428ee0555571b54a461d61fec839e075b8265bca9fae6edfb4b4e671d2dc3f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Computational Biology - methods</topic><topic>Data mining</topic><topic>Decision trees</topic><topic>Drug Discovery - methods</topic><topic>E coli</topic><topic>Genetic Diseases, Inborn - genetics</topic><topic>Genetics</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>Morbidity</topic><topic>Proceedings</topic><topic>Proteins - genetics</topic><topic>Proteins - metabolism</topic><topic>Studies</topic><topic>Yeast</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Costa, Pedro R</creatorcontrib><creatorcontrib>Acencio, Marcio L</creatorcontrib><creatorcontrib>Lemke, Ney</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC genomics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Costa, Pedro R</au><au>Acencio, Marcio L</au><au>Lemke, Ney</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data</atitle><jtitle>BMC genomics</jtitle><addtitle>BMC Genomics</addtitle><date>2010-12-22</date><risdate>2010</risdate><volume>11 Suppl 5</volume><issue>S5</issue><spage>S9</spage><epage>S9</epage><pages>S9-S9</pages><artnum>S9</artnum><issn>1471-2164</issn><eissn>1471-2164</eissn><abstract>The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.
In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.
We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.</abstract><cop>England</cop><pub>BioMed Central</pub><pmid>21210975</pmid><doi>10.1186/1471-2164-11-S5-S9</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1471-2164 |
ispartof | BMC genomics, 2010-12, Vol.11 Suppl 5 (S5), p.S9-S9, Article S9 |
issn | 1471-2164 1471-2164 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3045802 |
source | Publicly Available Content Database; PubMed Central |
subjects | Algorithms Artificial Intelligence Bioinformatics Biology Computational Biology - methods Data mining Decision trees Drug Discovery - methods E coli Genetic Diseases, Inborn - genetics Genetics Genomes Genomics Genomics - methods Humans Morbidity Proceedings Proteins - genetics Proteins - metabolism Studies Yeast |
title | A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T14%3A21%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20machine%20learning%20approach%20for%20genome-wide%20prediction%20of%20morbid%20and%20druggable%20human%20genes%20based%20on%20systems-level%20data&rft.jtitle=BMC%20genomics&rft.au=Costa,%20Pedro%20R&rft.date=2010-12-22&rft.volume=11%20Suppl%205&rft.issue=S5&rft.spage=S9&rft.epage=S9&rft.pages=S9-S9&rft.artnum=S9&rft.issn=1471-2164&rft.eissn=1471-2164&rft_id=info:doi/10.1186/1471-2164-11-S5-S9&rft_dat=%3Cproquest_pubme%3E845392654%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-b571t-bf428ee0555571b54a461d61fec839e075b8265bca9fae6edfb4b4e671d2dc3f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=902035636&rft_id=info:pmid/21210975&rfr_iscdi=true |