Loading…

Understanding the performance of knowledge graph embeddings in drug discovery

Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process whi...

Full description

Saved in:
Bibliographic Details
Published in:Artificial intelligence in the life sciences 2022-12, Vol.2, p.100036, Article 100036
Main Authors: Bonner, Stephen, Barrett, Ian P., Ye, Cheng, Swiers, Rowan, Engkvist, Ola, Hoyt, Charles Tapley, Hamilton, William L.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c3336-805c861d49244c97061d5d5ba11be510e7fdaeace8ab980d5d87e9f3d2c928833
cites cdi_FETCH-LOGICAL-c3336-805c861d49244c97061d5d5ba11be510e7fdaeace8ab980d5d87e9f3d2c928833
container_end_page
container_issue
container_start_page 100036
container_title Artificial intelligence in the life sciences
container_volume 2
creator Bonner, Stephen
Barrett, Ian P.
Ye, Cheng
Swiers, Rowan
Engkvist, Ola
Hoyt, Charles Tapley
Hamilton, William L.
description Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.
doi_str_mv 10.1016/j.ailsci.2022.100036
format article
fullrecord <record><control><sourceid>elsevier_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_6d77b82e3e694199962f332bc8954456</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S2667318522000071</els_id><doaj_id>oai_doaj_org_article_6d77b82e3e694199962f332bc8954456</doaj_id><sourcerecordid>S2667318522000071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3336-805c861d49244c97061d5d5ba11be510e7fdaeace8ab980d5d87e9f3d2c928833</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEhX0D1j4B1L8SBx7g4QqHpWK2NC15diT1CFNKjsU9e9xCUKsWM1o5s6RfRC6oWRBCRW37cL4Llq_YISxNCKEizM0Y0KUGaeyOP_TX6J5jG2KMEkpE-UMvWx6ByGOpne-b_C4BbyHUA9hZ3oLeKjxez98duAawE0w-y2GXQXuFI7Y99iFjwY7H-1wgHC8Rhe16SLMf-oV2jw-vC2fs_Xr02p5v84s51xkkhRWCupyxfLcqpKkvnBFZSitoKAEytoZMBakqZQkaSdLUDV3zComJedXaDVx3WBavQ9-Z8JRD8br78EQGm3C6G0HWriyrCQDDkLlVCklWM05q6xURZ4XIrHyiWXDEGOA-pdHiT4Z1q2eDOuTYT0ZTmd30xmkfx48BJ0SkJw5H8CO6SH-f8AXJReFaQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Understanding the performance of knowledge graph embeddings in drug discovery</title><source>ScienceDirect Journals</source><creator>Bonner, Stephen ; Barrett, Ian P. ; Ye, Cheng ; Swiers, Rowan ; Engkvist, Ola ; Hoyt, Charles Tapley ; Hamilton, William L.</creator><creatorcontrib>Bonner, Stephen ; Barrett, Ian P. ; Ye, Cheng ; Swiers, Rowan ; Engkvist, Ola ; Hoyt, Charles Tapley ; Hamilton, William L.</creatorcontrib><description>Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.</description><identifier>ISSN: 2667-3185</identifier><identifier>EISSN: 2667-3185</identifier><identifier>DOI: 10.1016/j.ailsci.2022.100036</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Drug discovery ; Knowledge grahps ; Knowledge graph embedding</subject><ispartof>Artificial intelligence in the life sciences, 2022-12, Vol.2, p.100036, Article 100036</ispartof><rights>2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3336-805c861d49244c97061d5d5ba11be510e7fdaeace8ab980d5d87e9f3d2c928833</citedby><cites>FETCH-LOGICAL-c3336-805c861d49244c97061d5d5ba11be510e7fdaeace8ab980d5d87e9f3d2c928833</cites><orcidid>0000-0001-6008-358X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S2667318522000071$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3549,27924,27925,45780</link.rule.ids></links><search><creatorcontrib>Bonner, Stephen</creatorcontrib><creatorcontrib>Barrett, Ian P.</creatorcontrib><creatorcontrib>Ye, Cheng</creatorcontrib><creatorcontrib>Swiers, Rowan</creatorcontrib><creatorcontrib>Engkvist, Ola</creatorcontrib><creatorcontrib>Hoyt, Charles Tapley</creatorcontrib><creatorcontrib>Hamilton, William L.</creatorcontrib><title>Understanding the performance of knowledge graph embeddings in drug discovery</title><title>Artificial intelligence in the life sciences</title><description>Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.</description><subject>Drug discovery</subject><subject>Knowledge grahps</subject><subject>Knowledge graph embedding</subject><issn>2667-3185</issn><issn>2667-3185</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp9kMtOwzAQRS0EEhX0D1j4B1L8SBx7g4QqHpWK2NC15diT1CFNKjsU9e9xCUKsWM1o5s6RfRC6oWRBCRW37cL4Llq_YISxNCKEizM0Y0KUGaeyOP_TX6J5jG2KMEkpE-UMvWx6ByGOpne-b_C4BbyHUA9hZ3oLeKjxez98duAawE0w-y2GXQXuFI7Y99iFjwY7H-1wgHC8Rhe16SLMf-oV2jw-vC2fs_Xr02p5v84s51xkkhRWCupyxfLcqpKkvnBFZSitoKAEytoZMBakqZQkaSdLUDV3zComJedXaDVx3WBavQ9-Z8JRD8br78EQGm3C6G0HWriyrCQDDkLlVCklWM05q6xURZ4XIrHyiWXDEGOA-pdHiT4Z1q2eDOuTYT0ZTmd30xmkfx48BJ0SkJw5H8CO6SH-f8AXJReFaQ</recordid><startdate>202212</startdate><enddate>202212</enddate><creator>Bonner, Stephen</creator><creator>Barrett, Ian P.</creator><creator>Ye, Cheng</creator><creator>Swiers, Rowan</creator><creator>Engkvist, Ola</creator><creator>Hoyt, Charles Tapley</creator><creator>Hamilton, William L.</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6008-358X</orcidid></search><sort><creationdate>202212</creationdate><title>Understanding the performance of knowledge graph embeddings in drug discovery</title><author>Bonner, Stephen ; Barrett, Ian P. ; Ye, Cheng ; Swiers, Rowan ; Engkvist, Ola ; Hoyt, Charles Tapley ; Hamilton, William L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3336-805c861d49244c97061d5d5ba11be510e7fdaeace8ab980d5d87e9f3d2c928833</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Drug discovery</topic><topic>Knowledge grahps</topic><topic>Knowledge graph embedding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bonner, Stephen</creatorcontrib><creatorcontrib>Barrett, Ian P.</creatorcontrib><creatorcontrib>Ye, Cheng</creatorcontrib><creatorcontrib>Swiers, Rowan</creatorcontrib><creatorcontrib>Engkvist, Ola</creatorcontrib><creatorcontrib>Hoyt, Charles Tapley</creatorcontrib><creatorcontrib>Hamilton, William L.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>Directory of Open Access Journals</collection><jtitle>Artificial intelligence in the life sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bonner, Stephen</au><au>Barrett, Ian P.</au><au>Ye, Cheng</au><au>Swiers, Rowan</au><au>Engkvist, Ola</au><au>Hoyt, Charles Tapley</au><au>Hamilton, William L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Understanding the performance of knowledge graph embeddings in drug discovery</atitle><jtitle>Artificial intelligence in the life sciences</jtitle><date>2022-12</date><risdate>2022</risdate><volume>2</volume><spage>100036</spage><pages>100036-</pages><artnum>100036</artnum><issn>2667-3185</issn><eissn>2667-3185</eissn><abstract>Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.ailsci.2022.100036</doi><orcidid>https://orcid.org/0000-0001-6008-358X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2667-3185
ispartof Artificial intelligence in the life sciences, 2022-12, Vol.2, p.100036, Article 100036
issn 2667-3185
2667-3185
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_6d77b82e3e694199962f332bc8954456
source ScienceDirect Journals
subjects Drug discovery
Knowledge grahps
Knowledge graph embedding
title Understanding the performance of knowledge graph embeddings in drug discovery
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A54%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Understanding%20the%20performance%20of%20knowledge%20graph%20embeddings%20in%20drug%20discovery&rft.jtitle=Artificial%20intelligence%20in%20the%20life%20sciences&rft.au=Bonner,%20Stephen&rft.date=2022-12&rft.volume=2&rft.spage=100036&rft.pages=100036-&rft.artnum=100036&rft.issn=2667-3185&rft.eissn=2667-3185&rft_id=info:doi/10.1016/j.ailsci.2022.100036&rft_dat=%3Celsevier_doaj_%3ES2667318522000071%3C/elsevier_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3336-805c861d49244c97061d5d5ba11be510e7fdaeace8ab980d5d87e9f3d2c928833%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true