Loading…
Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are...
Saved in:
Published in: | Metabolites 2024-10, Vol.14 (11), p.582 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c409t-74df1a33c0dc462c88f71f6c511612a70efda55a7df79b7d98cefc4b96a4772a3 |
container_end_page | |
container_issue | 11 |
container_start_page | 582 |
container_title | Metabolites |
container_volume | 14 |
creator | Huckvale, Erik D Moseley, Hunter N B |
description | : Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists.
: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset.
: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098.
: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways. |
doi_str_mv | 10.3390/metabo14110582 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_f4b152c4f2b143d19f150fbdc8f10c3b</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A818340055</galeid><doaj_id>oai_doaj_org_article_f4b152c4f2b143d19f150fbdc8f10c3b</doaj_id><sourcerecordid>A818340055</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-74df1a33c0dc462c88f71f6c511612a70efda55a7df79b7d98cefc4b96a4772a3</originalsourceid><addsrcrecordid>eNpdks-PEyEUxydG427WvXo0k3jx0pU3wDCcTNNd18ZN3IOeCcOPlmYGKtCa_gf-2dJ2rVvhAHnfLx94j1dVbwHdYMzRx9Fk2QcgAIh2zYvqsmmgmwDv-Mtn-4vqOqUVKqNFlCF4XV1gTjnqoLusfj9Go53Kzi_qvDT1o8zLX3JXz_02DFszGp_rYOvpMJwk6XU9TSkoJ7PR9SyM67ApsTufozOpvjXW-SI4fyB-3YUciqh2agjrcpncA--NL9Y9quzCaNKb6pWVQzLXT-tV9ePz3ffZl8nDt_v5bPowUQTxPGFEW5AYK6QVaRvVdZaBbRUFaKGRDBmrJaWSact4zzTvlLGK9LyVhLFG4qtqfuTqIFdiHd0o404E6cQhEOJCyJidGoywpAfaKGKbHgjWwC1QZHutOgtI4b6wPh1Z600_Gq1KsaIczqDnindLsQhbAUB52zZNIXx4IsTwc2NSFqNLygyD9CZsksCAMSmPoHvr-_-sq7CJvtTq4EKcM8KK6-boWsiSgfM2lItVmdqMTgVf_qbEp-XvMUGI0n8HVAwpRWNPzwck9l0mzrusHHj3POmT_W9P4T8BU8_X</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3133099747</pqid></control><display><type>article</type><title>Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes</title><source>PubMed Central Free</source><source>Publicly Available Content Database</source><creator>Huckvale, Erik D ; Moseley, Hunter N B</creator><creatorcontrib>Huckvale, Erik D ; Moseley, Hunter N B</creatorcontrib><description>: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists.
: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset.
: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098.
: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.</description><identifier>ISSN: 2218-1989</identifier><identifier>EISSN: 2218-1989</identifier><identifier>DOI: 10.3390/metabo14110582</identifier><identifier>PMID: 39590818</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Annotations ; Cell culture ; Datasets ; Genes ; Genomes ; Genomics ; Information processing ; KEGG ; Machine learning ; Matthews correlation coefficient ; Medical research ; Metabolic pathways ; Metabolism ; Metabolites ; multi-layer perceptron ; pathway prediction ; Physiological aspects ; Random access memory ; Standard deviation ; Transfer learning</subject><ispartof>Metabolites, 2024-10, Vol.14 (11), p.582</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2024 by the authors. 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c409t-74df1a33c0dc462c88f71f6c511612a70efda55a7df79b7d98cefc4b96a4772a3</cites><orcidid>0000-0003-3995-5368</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3133099747/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3133099747?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39590818$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Huckvale, Erik D</creatorcontrib><creatorcontrib>Moseley, Hunter N B</creatorcontrib><title>Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes</title><title>Metabolites</title><addtitle>Metabolites</addtitle><description>: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists.
: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset.
: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098.
: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.</description><subject>Annotations</subject><subject>Cell culture</subject><subject>Datasets</subject><subject>Genes</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Information processing</subject><subject>KEGG</subject><subject>Machine learning</subject><subject>Matthews correlation coefficient</subject><subject>Medical research</subject><subject>Metabolic pathways</subject><subject>Metabolism</subject><subject>Metabolites</subject><subject>multi-layer perceptron</subject><subject>pathway prediction</subject><subject>Physiological aspects</subject><subject>Random access memory</subject><subject>Standard deviation</subject><subject>Transfer learning</subject><issn>2218-1989</issn><issn>2218-1989</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpdks-PEyEUxydG427WvXo0k3jx0pU3wDCcTNNd18ZN3IOeCcOPlmYGKtCa_gf-2dJ2rVvhAHnfLx94j1dVbwHdYMzRx9Fk2QcgAIh2zYvqsmmgmwDv-Mtn-4vqOqUVKqNFlCF4XV1gTjnqoLusfj9Go53Kzi_qvDT1o8zLX3JXz_02DFszGp_rYOvpMJwk6XU9TSkoJ7PR9SyM67ApsTufozOpvjXW-SI4fyB-3YUciqh2agjrcpncA--NL9Y9quzCaNKb6pWVQzLXT-tV9ePz3ffZl8nDt_v5bPowUQTxPGFEW5AYK6QVaRvVdZaBbRUFaKGRDBmrJaWSact4zzTvlLGK9LyVhLFG4qtqfuTqIFdiHd0o404E6cQhEOJCyJidGoywpAfaKGKbHgjWwC1QZHutOgtI4b6wPh1Z600_Gq1KsaIczqDnindLsQhbAUB52zZNIXx4IsTwc2NSFqNLygyD9CZsksCAMSmPoHvr-_-sq7CJvtTq4EKcM8KK6-boWsiSgfM2lItVmdqMTgVf_qbEp-XvMUGI0n8HVAwpRWNPzwck9l0mzrusHHj3POmT_W9P4T8BU8_X</recordid><startdate>20241027</startdate><enddate>20241027</enddate><creator>Huckvale, Erik D</creator><creator>Moseley, Hunter N B</creator><general>MDPI AG</general><general>MDPI</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QR</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M7P</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-3995-5368</orcidid></search><sort><creationdate>20241027</creationdate><title>Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes</title><author>Huckvale, Erik D ; Moseley, Hunter N B</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-74df1a33c0dc462c88f71f6c511612a70efda55a7df79b7d98cefc4b96a4772a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Cell culture</topic><topic>Datasets</topic><topic>Genes</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Information processing</topic><topic>KEGG</topic><topic>Machine learning</topic><topic>Matthews correlation coefficient</topic><topic>Medical research</topic><topic>Metabolic pathways</topic><topic>Metabolism</topic><topic>Metabolites</topic><topic>multi-layer perceptron</topic><topic>pathway prediction</topic><topic>Physiological aspects</topic><topic>Random access memory</topic><topic>Standard deviation</topic><topic>Transfer learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huckvale, Erik D</creatorcontrib><creatorcontrib>Moseley, Hunter N B</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Chemoreception Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Metabolites</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huckvale, Erik D</au><au>Moseley, Hunter N B</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes</atitle><jtitle>Metabolites</jtitle><addtitle>Metabolites</addtitle><date>2024-10-27</date><risdate>2024</risdate><volume>14</volume><issue>11</issue><spage>582</spage><pages>582-</pages><issn>2218-1989</issn><eissn>2218-1989</eissn><abstract>: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists.
: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset.
: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098.
: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>39590818</pmid><doi>10.3390/metabo14110582</doi><orcidid>https://orcid.org/0000-0003-3995-5368</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2218-1989 |
ispartof | Metabolites, 2024-10, Vol.14 (11), p.582 |
issn | 2218-1989 2218-1989 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_f4b152c4f2b143d19f150fbdc8f10c3b |
source | PubMed Central Free; Publicly Available Content Database |
subjects | Annotations Cell culture Datasets Genes Genomes Genomics Information processing KEGG Machine learning Matthews correlation coefficient Medical research Metabolic pathways Metabolism Metabolites multi-layer perceptron pathway prediction Physiological aspects Random access memory Standard deviation Transfer learning |
title | Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T15%3A06%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20the%20Pathway%20Involvement%20of%20All%20Pathway%20and%20Associated%20Compound%20Entries%20Defined%20in%20the%20Kyoto%20Encyclopedia%20of%20Genes%20and%20Genomes&rft.jtitle=Metabolites&rft.au=Huckvale,%20Erik%20D&rft.date=2024-10-27&rft.volume=14&rft.issue=11&rft.spage=582&rft.pages=582-&rft.issn=2218-1989&rft.eissn=2218-1989&rft_id=info:doi/10.3390/metabo14110582&rft_dat=%3Cgale_doaj_%3EA818340055%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c409t-74df1a33c0dc462c88f71f6c511612a70efda55a7df79b7d98cefc4b96a4772a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3133099747&rft_id=info:pmid/39590818&rft_galeid=A818340055&rfr_iscdi=true |