Loading…
Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining
The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interact...
Saved in:
Published in: | BMC bioinformatics 2023-12, Vol.24 (1), p.488-41, Article 488 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c632t-e3e4a44fce6ab2b74a23e97df921bc444544ff592b710cb08f6b7893f1a383d3 |
---|---|
cites | cdi_FETCH-LOGICAL-c632t-e3e4a44fce6ab2b74a23e97df921bc444544ff592b710cb08f6b7893f1a383d3 |
container_end_page | 41 |
container_issue | 1 |
container_start_page | 488 |
container_title | BMC bioinformatics |
container_volume | 24 |
creator | Djeddi, Warith Eddine Hermi, Khalil Ben Yahia, Sadok Diallo, Gayo |
description | The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interactions. Recently, machine learning techniques, particularly graph-based methods, have gained prominence. These methods utilize networks of drugs and targets, employing knowledge graph embedding (KGE) to represent structured information from knowledge graphs in a continuous vector space. This phenomenon highlights the growing inclination to utilize graph topologies as a means to improve the precision of predicting DTIs, hence addressing the pressing requirement for effective computational methodologies in the field of drug discovery.
The present study presents a novel approach called DTIOG for the prediction of DTIs. The methodology employed in this study involves the utilization of a KGE strategy, together with the incorporation of contextual information obtained from protein sequences. More specifically, the study makes use of Protein Bidirectional Encoder Representations from Transformers (ProtBERT) for this purpose. DTIOG utilizes a two-step process to compute embedding vectors using KGE techniques. Additionally, it employs ProtBERT to determine target-target similarity. Different similarity measures, such as Cosine similarity or Euclidean distance, are utilized in the prediction procedure. In addition to the contextual embedding, the proposed unique approach incorporates local representations obtained from the Simplified Molecular Input Line Entry Specification (SMILES) of drugs and the amino acid sequences of protein targets.
The effectiveness of the proposed approach was assessed through extensive experimentation on datasets pertaining to Enzymes, Ion Channels, and G-protein-coupled Receptors. The remarkable efficacy of DTIOG was showcased through the utilization of diverse similarity measures in order to calculate the similarities between drugs and targets. The combination of these factors, along with the incorporation of various classifiers, enabled the model to outperform existing algorithms in its ability to predict DTIs. The consistent observation of this advantage across all datasets underlines the robustness and accuracy of DTIOG in the domain of DTIs. Additionally, our case study suggests that the DTIOG can serve as a valuable tool for discovering ne |
doi_str_mv | 10.1186/s12859-023-05593-6 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_d00eb7fc2a7743fca095ca36f2f5655c</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A776792630</galeid><doaj_id>oai_doaj_org_article_d00eb7fc2a7743fca095ca36f2f5655c</doaj_id><sourcerecordid>A776792630</sourcerecordid><originalsourceid>FETCH-LOGICAL-c632t-e3e4a44fce6ab2b74a23e97df921bc444544ff592b710cb08f6b7893f1a383d3</originalsourceid><addsrcrecordid>eNptks1uEzEUhUcIREvgBVigkdjQxRT_jmfYoFABrVQJBN1bHvt64pLYwXYCPAZvjCcppamQF7bu-c6xfXWr6jlGpxh37euEScf7BhHaIM572rQPqmPMBG4IRvzhnfNR9SSla4Sw6BB_XB3RDmPWU3Fc_Z6brfLa-bE2cTM2WcURcu18hqh0dsHX6wjG7Y5valXrsCqFBfjktlCPUa0XzaASmFqt1zEovdiZi5Cn0G8-_FiCGW_QGlYDGDMpypv6cwz5HcQ83ZGjcr4IT6tHVi0TPLvZZ9XVh_dXZ-fN5aePF2fzy0a3lOQGKDDFmNXQqoEMgilCoRfG9gQPmjHGi2h5XySM9IA62w6i66nFinbU0Fl1sY81QV3LdXQrFX_JoJzcFUIcpYrZ6SVIgxAMwmqihGDUaoV6rhVtLbG85VyXrLf7rPVmWIHR4Mtnlgehh4p3CzmGrcRIUNwRXBJO9gmLe77z-aWcaoiVVyPEthP76ua2GL5vIGW5cknDcqk8hE2SpEcM8xYXw6x6eQ-9DpvoS1sLhRkRghP0jxpV-a3zNpRH6ilUzoVoRU9aOlGn_6HKMrByOniwrtQPDCcHhsJk-JlHtUlJXnz9csiSPatjSCmCvW0CRnIadrkfdlmGXe6GXbbF9OJu228tf6eb_gHAefsI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2914277520</pqid></control><display><type>article</type><title>Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining</title><source>PubMed (Medline)</source><source>Publicly Available Content Database</source><creator>Djeddi, Warith Eddine ; Hermi, Khalil ; Ben Yahia, Sadok ; Diallo, Gayo</creator><creatorcontrib>Djeddi, Warith Eddine ; Hermi, Khalil ; Ben Yahia, Sadok ; Diallo, Gayo</creatorcontrib><description>The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interactions. Recently, machine learning techniques, particularly graph-based methods, have gained prominence. These methods utilize networks of drugs and targets, employing knowledge graph embedding (KGE) to represent structured information from knowledge graphs in a continuous vector space. This phenomenon highlights the growing inclination to utilize graph topologies as a means to improve the precision of predicting DTIs, hence addressing the pressing requirement for effective computational methodologies in the field of drug discovery.
The present study presents a novel approach called DTIOG for the prediction of DTIs. The methodology employed in this study involves the utilization of a KGE strategy, together with the incorporation of contextual information obtained from protein sequences. More specifically, the study makes use of Protein Bidirectional Encoder Representations from Transformers (ProtBERT) for this purpose. DTIOG utilizes a two-step process to compute embedding vectors using KGE techniques. Additionally, it employs ProtBERT to determine target-target similarity. Different similarity measures, such as Cosine similarity or Euclidean distance, are utilized in the prediction procedure. In addition to the contextual embedding, the proposed unique approach incorporates local representations obtained from the Simplified Molecular Input Line Entry Specification (SMILES) of drugs and the amino acid sequences of protein targets.
The effectiveness of the proposed approach was assessed through extensive experimentation on datasets pertaining to Enzymes, Ion Channels, and G-protein-coupled Receptors. The remarkable efficacy of DTIOG was showcased through the utilization of diverse similarity measures in order to calculate the similarities between drugs and targets. The combination of these factors, along with the incorporation of various classifiers, enabled the model to outperform existing algorithms in its ability to predict DTIs. The consistent observation of this advantage across all datasets underlines the robustness and accuracy of DTIOG in the domain of DTIs. Additionally, our case study suggests that the DTIOG can serve as a valuable tool for discovering new DTIs.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-023-05593-6</identifier><identifier>PMID: 38114937</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Algorithms ; Amino acids ; Analysis ; Case studies ; Clinical trials ; Computer applications ; Computer Science ; Cosine similarity ; COVID-19 ; Datasets ; Drug Development - methods ; Drug discovery ; Drug Interactions ; Drugs ; Drug–target interaction prediction ; Embedding ; Euclidean geometry ; Evaluation ; G protein-coupled receptors ; Ion channels ; Knowledge ; Knowledge Bases ; Knowledge graph embedding ; Knowledge representation ; Ligands ; Machine learning ; Neural networks ; Pandemics ; Pattern Recognition, Automated ; Predictions ; ProtBERT ; Proteins ; Proteins - chemistry ; Research methodology ; Severe acute respiratory syndrome coronavirus 2 ; Similarity ; Therapeutic targets ; Topology ; Vector spaces ; Viruses</subject><ispartof>BMC bioinformatics, 2023-12, Vol.24 (1), p.488-41, Article 488</ispartof><rights>2023. The Author(s).</rights><rights>COPYRIGHT 2023 BioMed Central Ltd.</rights><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c632t-e3e4a44fce6ab2b74a23e97df921bc444544ff592b710cb08f6b7893f1a383d3</citedby><cites>FETCH-LOGICAL-c632t-e3e4a44fce6ab2b74a23e97df921bc444544ff592b710cb08f6b7893f1a383d3</cites><orcidid>0000-0001-8939-8948</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10731821/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2914277520?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38114937$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-04383004$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Djeddi, Warith Eddine</creatorcontrib><creatorcontrib>Hermi, Khalil</creatorcontrib><creatorcontrib>Ben Yahia, Sadok</creatorcontrib><creatorcontrib>Diallo, Gayo</creatorcontrib><title>Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interactions. Recently, machine learning techniques, particularly graph-based methods, have gained prominence. These methods utilize networks of drugs and targets, employing knowledge graph embedding (KGE) to represent structured information from knowledge graphs in a continuous vector space. This phenomenon highlights the growing inclination to utilize graph topologies as a means to improve the precision of predicting DTIs, hence addressing the pressing requirement for effective computational methodologies in the field of drug discovery.
The present study presents a novel approach called DTIOG for the prediction of DTIs. The methodology employed in this study involves the utilization of a KGE strategy, together with the incorporation of contextual information obtained from protein sequences. More specifically, the study makes use of Protein Bidirectional Encoder Representations from Transformers (ProtBERT) for this purpose. DTIOG utilizes a two-step process to compute embedding vectors using KGE techniques. Additionally, it employs ProtBERT to determine target-target similarity. Different similarity measures, such as Cosine similarity or Euclidean distance, are utilized in the prediction procedure. In addition to the contextual embedding, the proposed unique approach incorporates local representations obtained from the Simplified Molecular Input Line Entry Specification (SMILES) of drugs and the amino acid sequences of protein targets.
The effectiveness of the proposed approach was assessed through extensive experimentation on datasets pertaining to Enzymes, Ion Channels, and G-protein-coupled Receptors. The remarkable efficacy of DTIOG was showcased through the utilization of diverse similarity measures in order to calculate the similarities between drugs and targets. The combination of these factors, along with the incorporation of various classifiers, enabled the model to outperform existing algorithms in its ability to predict DTIs. The consistent observation of this advantage across all datasets underlines the robustness and accuracy of DTIOG in the domain of DTIs. Additionally, our case study suggests that the DTIOG can serve as a valuable tool for discovering new DTIs.</description><subject>Algorithms</subject><subject>Amino acids</subject><subject>Analysis</subject><subject>Case studies</subject><subject>Clinical trials</subject><subject>Computer applications</subject><subject>Computer Science</subject><subject>Cosine similarity</subject><subject>COVID-19</subject><subject>Datasets</subject><subject>Drug Development - methods</subject><subject>Drug discovery</subject><subject>Drug Interactions</subject><subject>Drugs</subject><subject>Drug–target interaction prediction</subject><subject>Embedding</subject><subject>Euclidean geometry</subject><subject>Evaluation</subject><subject>G protein-coupled receptors</subject><subject>Ion channels</subject><subject>Knowledge</subject><subject>Knowledge Bases</subject><subject>Knowledge graph embedding</subject><subject>Knowledge representation</subject><subject>Ligands</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Pandemics</subject><subject>Pattern Recognition, Automated</subject><subject>Predictions</subject><subject>ProtBERT</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Research methodology</subject><subject>Severe acute respiratory syndrome coronavirus 2</subject><subject>Similarity</subject><subject>Therapeutic targets</subject><subject>Topology</subject><subject>Vector spaces</subject><subject>Viruses</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptks1uEzEUhUcIREvgBVigkdjQxRT_jmfYoFABrVQJBN1bHvt64pLYwXYCPAZvjCcppamQF7bu-c6xfXWr6jlGpxh37euEScf7BhHaIM572rQPqmPMBG4IRvzhnfNR9SSla4Sw6BB_XB3RDmPWU3Fc_Z6brfLa-bE2cTM2WcURcu18hqh0dsHX6wjG7Y5valXrsCqFBfjktlCPUa0XzaASmFqt1zEovdiZi5Cn0G8-_FiCGW_QGlYDGDMpypv6cwz5HcQ83ZGjcr4IT6tHVi0TPLvZZ9XVh_dXZ-fN5aePF2fzy0a3lOQGKDDFmNXQqoEMgilCoRfG9gQPmjHGi2h5XySM9IA62w6i66nFinbU0Fl1sY81QV3LdXQrFX_JoJzcFUIcpYrZ6SVIgxAMwmqihGDUaoV6rhVtLbG85VyXrLf7rPVmWIHR4Mtnlgehh4p3CzmGrcRIUNwRXBJO9gmLe77z-aWcaoiVVyPEthP76ua2GL5vIGW5cknDcqk8hE2SpEcM8xYXw6x6eQ-9DpvoS1sLhRkRghP0jxpV-a3zNpRH6ilUzoVoRU9aOlGn_6HKMrByOniwrtQPDCcHhsJk-JlHtUlJXnz9csiSPatjSCmCvW0CRnIadrkfdlmGXe6GXbbF9OJu228tf6eb_gHAefsI</recordid><startdate>20231219</startdate><enddate>20231219</enddate><creator>Djeddi, Warith Eddine</creator><creator>Hermi, Khalil</creator><creator>Ben Yahia, Sadok</creator><creator>Diallo, Gayo</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>1XC</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8939-8948</orcidid></search><sort><creationdate>20231219</creationdate><title>Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining</title><author>Djeddi, Warith Eddine ; Hermi, Khalil ; Ben Yahia, Sadok ; Diallo, Gayo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c632t-e3e4a44fce6ab2b74a23e97df921bc444544ff592b710cb08f6b7893f1a383d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Amino acids</topic><topic>Analysis</topic><topic>Case studies</topic><topic>Clinical trials</topic><topic>Computer applications</topic><topic>Computer Science</topic><topic>Cosine similarity</topic><topic>COVID-19</topic><topic>Datasets</topic><topic>Drug Development - methods</topic><topic>Drug discovery</topic><topic>Drug Interactions</topic><topic>Drugs</topic><topic>Drug–target interaction prediction</topic><topic>Embedding</topic><topic>Euclidean geometry</topic><topic>Evaluation</topic><topic>G protein-coupled receptors</topic><topic>Ion channels</topic><topic>Knowledge</topic><topic>Knowledge Bases</topic><topic>Knowledge graph embedding</topic><topic>Knowledge representation</topic><topic>Ligands</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Pandemics</topic><topic>Pattern Recognition, Automated</topic><topic>Predictions</topic><topic>ProtBERT</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Research methodology</topic><topic>Severe acute respiratory syndrome coronavirus 2</topic><topic>Similarity</topic><topic>Therapeutic targets</topic><topic>Topology</topic><topic>Vector spaces</topic><topic>Viruses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Djeddi, Warith Eddine</creatorcontrib><creatorcontrib>Hermi, Khalil</creatorcontrib><creatorcontrib>Ben Yahia, Sadok</creatorcontrib><creatorcontrib>Diallo, Gayo</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Djeddi, Warith Eddine</au><au>Hermi, Khalil</au><au>Ben Yahia, Sadok</au><au>Diallo, Gayo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2023-12-19</date><risdate>2023</risdate><volume>24</volume><issue>1</issue><spage>488</spage><epage>41</epage><pages>488-41</pages><artnum>488</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interactions. Recently, machine learning techniques, particularly graph-based methods, have gained prominence. These methods utilize networks of drugs and targets, employing knowledge graph embedding (KGE) to represent structured information from knowledge graphs in a continuous vector space. This phenomenon highlights the growing inclination to utilize graph topologies as a means to improve the precision of predicting DTIs, hence addressing the pressing requirement for effective computational methodologies in the field of drug discovery.
The present study presents a novel approach called DTIOG for the prediction of DTIs. The methodology employed in this study involves the utilization of a KGE strategy, together with the incorporation of contextual information obtained from protein sequences. More specifically, the study makes use of Protein Bidirectional Encoder Representations from Transformers (ProtBERT) for this purpose. DTIOG utilizes a two-step process to compute embedding vectors using KGE techniques. Additionally, it employs ProtBERT to determine target-target similarity. Different similarity measures, such as Cosine similarity or Euclidean distance, are utilized in the prediction procedure. In addition to the contextual embedding, the proposed unique approach incorporates local representations obtained from the Simplified Molecular Input Line Entry Specification (SMILES) of drugs and the amino acid sequences of protein targets.
The effectiveness of the proposed approach was assessed through extensive experimentation on datasets pertaining to Enzymes, Ion Channels, and G-protein-coupled Receptors. The remarkable efficacy of DTIOG was showcased through the utilization of diverse similarity measures in order to calculate the similarities between drugs and targets. The combination of these factors, along with the incorporation of various classifiers, enabled the model to outperform existing algorithms in its ability to predict DTIs. The consistent observation of this advantage across all datasets underlines the robustness and accuracy of DTIOG in the domain of DTIs. Additionally, our case study suggests that the DTIOG can serve as a valuable tool for discovering new DTIs.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>38114937</pmid><doi>10.1186/s12859-023-05593-6</doi><tpages>41</tpages><orcidid>https://orcid.org/0000-0001-8939-8948</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1471-2105 |
ispartof | BMC bioinformatics, 2023-12, Vol.24 (1), p.488-41, Article 488 |
issn | 1471-2105 1471-2105 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_d00eb7fc2a7743fca095ca36f2f5655c |
source | PubMed (Medline); Publicly Available Content Database |
subjects | Algorithms Amino acids Analysis Case studies Clinical trials Computer applications Computer Science Cosine similarity COVID-19 Datasets Drug Development - methods Drug discovery Drug Interactions Drugs Drug–target interaction prediction Embedding Euclidean geometry Evaluation G protein-coupled receptors Ion channels Knowledge Knowledge Bases Knowledge graph embedding Knowledge representation Ligands Machine learning Neural networks Pandemics Pattern Recognition, Automated Predictions ProtBERT Proteins Proteins - chemistry Research methodology Severe acute respiratory syndrome coronavirus 2 Similarity Therapeutic targets Topology Vector spaces Viruses |
title | Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T18%3A06%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Advancing%20drug-target%20interaction%20prediction:%20a%20comprehensive%20graph-based%20approach%20integrating%20knowledge%20graph%20embedding%20and%20ProtBert%20pretraining&rft.jtitle=BMC%20bioinformatics&rft.au=Djeddi,%20Warith%20Eddine&rft.date=2023-12-19&rft.volume=24&rft.issue=1&rft.spage=488&rft.epage=41&rft.pages=488-41&rft.artnum=488&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-023-05593-6&rft_dat=%3Cgale_doaj_%3EA776792630%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c632t-e3e4a44fce6ab2b74a23e97df921bc444544ff592b710cb08f6b7893f1a383d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2914277520&rft_id=info:pmid/38114937&rft_galeid=A776792630&rfr_iscdi=true |