Loading…

Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites

Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on emerging topics in computational intelligence 2021-06, Vol.5 (3), p.373-383
Main Authors: Ng, Wing W. Y., Zhang, Yuda, Zhang, Jianjun, Wang, Debby D., Wang, Fu Lee
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963
cites cdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963
container_end_page 383
container_issue 3
container_start_page 373
container_title IEEE transactions on emerging topics in computational intelligence
container_volume 5
creator Ng, Wing W. Y.
Zhang, Yuda
Zhang, Jianjun
Wang, Debby D.
Wang, Fu Lee
description Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by \text{57.3}\%, \text{88.2}\%, and \text{78.2}\% out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.
doi_str_mv 10.1109/TETCI.2019.2922340
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2532300988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8753743</ieee_id><sourcerecordid>2532300988</sourcerecordid><originalsourceid>FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</originalsourceid><addsrcrecordid>eNpNkFtLAzEQhYMoWGr_gL4EfN6a23aTRy1eCgWFVvAtZLOzNbVNahKF_nu3bhGf5sCcM3P4ELqkZEwpUTfL--V0NmaEqjFTjHFBTtCAiYoWTJZvp__0ORqltCaEMFVSXooB-ljkYN9Nys7iBfjksvt2eY-XEQDfhdAt_Aq3IeLZtjYb4y00-CVC42x2wXcy1BvYJhzag87gfDF3K-MbPPMZoultC5chXaCz1mwSjI5ziF4fuupPxfz5cTa9nReWsTIXDWdNa1ohLAFVCVZLwWVVttJSYoFbLpUUIEpi1URxIwnhdFLzphEVqbma8CG67u_uYvj8gpT1OnxF373UrOSME6Kk7Fysd9kYUorQ6l10WxP3mhJ94Kp_ueoDV33k2oWu-pADgL9A145XgvMfPAJ0kQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2532300988</pqid></control><display><type>article</type><title>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Ng, Wing W. Y. ; Zhang, Yuda ; Zhang, Jianjun ; Wang, Debby D. ; Wang, Fu Lee</creator><creatorcontrib>Ng, Wing W. Y. ; Zhang, Yuda ; Zhang, Jianjun ; Wang, Debby D. ; Wang, Fu Lee</creatorcontrib><description><![CDATA[Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by <inline-formula><tex-math notation="LaTeX">\text{57.3}\%</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">\text{88.2}\%</tex-math></inline-formula>, and <inline-formula><tex-math notation="LaTeX">\text{78.2}\%</tex-math></inline-formula> out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.]]></description><identifier>ISSN: 2471-285X</identifier><identifier>EISSN: 2471-285X</identifier><identifier>DOI: 10.1109/TETCI.2019.2922340</identifier><identifier>CODEN: ITETCU</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Amino acids ; Binding ; Boosting ; Datasets ; decision tree ; imbalanced learning problem ; Machine learning ; Performance enhancement ; Perturbation methods ; Protein-ligand interaction sites ; Proteins ; Residues ; Sensitivity ; stochastic sensitivity measure ; Support vector machines ; Training</subject><ispartof>IEEE transactions on emerging topics in computational intelligence, 2021-06, Vol.5 (3), p.373-383</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</citedby><cites>FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</cites><orcidid>0000-0002-3976-0053 ; 0000-0002-3755-8943 ; 0000-0001-9133-4994 ; 0000-0003-0783-3585</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8753743$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Ng, Wing W. Y.</creatorcontrib><creatorcontrib>Zhang, Yuda</creatorcontrib><creatorcontrib>Zhang, Jianjun</creatorcontrib><creatorcontrib>Wang, Debby D.</creatorcontrib><creatorcontrib>Wang, Fu Lee</creatorcontrib><title>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</title><title>IEEE transactions on emerging topics in computational intelligence</title><addtitle>TETCI</addtitle><description><![CDATA[Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by <inline-formula><tex-math notation="LaTeX">\text{57.3}\%</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">\text{88.2}\%</tex-math></inline-formula>, and <inline-formula><tex-math notation="LaTeX">\text{78.2}\%</tex-math></inline-formula> out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.]]></description><subject>Algorithms</subject><subject>Amino acids</subject><subject>Binding</subject><subject>Boosting</subject><subject>Datasets</subject><subject>decision tree</subject><subject>imbalanced learning problem</subject><subject>Machine learning</subject><subject>Performance enhancement</subject><subject>Perturbation methods</subject><subject>Protein-ligand interaction sites</subject><subject>Proteins</subject><subject>Residues</subject><subject>Sensitivity</subject><subject>stochastic sensitivity measure</subject><subject>Support vector machines</subject><subject>Training</subject><issn>2471-285X</issn><issn>2471-285X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNkFtLAzEQhYMoWGr_gL4EfN6a23aTRy1eCgWFVvAtZLOzNbVNahKF_nu3bhGf5sCcM3P4ELqkZEwpUTfL--V0NmaEqjFTjHFBTtCAiYoWTJZvp__0ORqltCaEMFVSXooB-ljkYN9Nys7iBfjksvt2eY-XEQDfhdAt_Aq3IeLZtjYb4y00-CVC42x2wXcy1BvYJhzag87gfDF3K-MbPPMZoultC5chXaCz1mwSjI5ziF4fuupPxfz5cTa9nReWsTIXDWdNa1ohLAFVCVZLwWVVttJSYoFbLpUUIEpi1URxIwnhdFLzphEVqbma8CG67u_uYvj8gpT1OnxF373UrOSME6Kk7Fysd9kYUorQ6l10WxP3mhJ94Kp_ueoDV33k2oWu-pADgL9A145XgvMfPAJ0kQ</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Ng, Wing W. Y.</creator><creator>Zhang, Yuda</creator><creator>Zhang, Jianjun</creator><creator>Wang, Debby D.</creator><creator>Wang, Fu Lee</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-3976-0053</orcidid><orcidid>https://orcid.org/0000-0002-3755-8943</orcidid><orcidid>https://orcid.org/0000-0001-9133-4994</orcidid><orcidid>https://orcid.org/0000-0003-0783-3585</orcidid></search><sort><creationdate>20210601</creationdate><title>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</title><author>Ng, Wing W. Y. ; Zhang, Yuda ; Zhang, Jianjun ; Wang, Debby D. ; Wang, Fu Lee</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Amino acids</topic><topic>Binding</topic><topic>Boosting</topic><topic>Datasets</topic><topic>decision tree</topic><topic>imbalanced learning problem</topic><topic>Machine learning</topic><topic>Performance enhancement</topic><topic>Perturbation methods</topic><topic>Protein-ligand interaction sites</topic><topic>Proteins</topic><topic>Residues</topic><topic>Sensitivity</topic><topic>stochastic sensitivity measure</topic><topic>Support vector machines</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ng, Wing W. Y.</creatorcontrib><creatorcontrib>Zhang, Yuda</creatorcontrib><creatorcontrib>Zhang, Jianjun</creatorcontrib><creatorcontrib>Wang, Debby D.</creatorcontrib><creatorcontrib>Wang, Fu Lee</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on emerging topics in computational intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ng, Wing W. Y.</au><au>Zhang, Yuda</au><au>Zhang, Jianjun</au><au>Wang, Debby D.</au><au>Wang, Fu Lee</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</atitle><jtitle>IEEE transactions on emerging topics in computational intelligence</jtitle><stitle>TETCI</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>5</volume><issue>3</issue><spage>373</spage><epage>383</epage><pages>373-383</pages><issn>2471-285X</issn><eissn>2471-285X</eissn><coden>ITETCU</coden><abstract><![CDATA[Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by <inline-formula><tex-math notation="LaTeX">\text{57.3}\%</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">\text{88.2}\%</tex-math></inline-formula>, and <inline-formula><tex-math notation="LaTeX">\text{78.2}\%</tex-math></inline-formula> out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TETCI.2019.2922340</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-3976-0053</orcidid><orcidid>https://orcid.org/0000-0002-3755-8943</orcidid><orcidid>https://orcid.org/0000-0001-9133-4994</orcidid><orcidid>https://orcid.org/0000-0003-0783-3585</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2471-285X
ispartof IEEE transactions on emerging topics in computational intelligence, 2021-06, Vol.5 (3), p.373-383
issn 2471-285X
2471-285X
language eng
recordid cdi_proquest_journals_2532300988
source IEEE Electronic Library (IEL) Journals
subjects Algorithms
Amino acids
Binding
Boosting
Datasets
decision tree
imbalanced learning problem
Machine learning
Performance enhancement
Perturbation methods
Protein-ligand interaction sites
Proteins
Residues
Sensitivity
stochastic sensitivity measure
Support vector machines
Training
title Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T16%3A36%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Stochastic%20Sensitivity%20Tree%20Boosting%20for%20Imbalanced%20Prediction%20Problems%20of%20Protein-Ligand%20Interaction%20Sites&rft.jtitle=IEEE%20transactions%20on%20emerging%20topics%20in%20computational%20intelligence&rft.au=Ng,%20Wing%20W.%20Y.&rft.date=2021-06-01&rft.volume=5&rft.issue=3&rft.spage=373&rft.epage=383&rft.pages=373-383&rft.issn=2471-285X&rft.eissn=2471-285X&rft.coden=ITETCU&rft_id=info:doi/10.1109/TETCI.2019.2922340&rft_dat=%3Cproquest_ieee_%3E2532300988%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2532300988&rft_id=info:pmid/&rft_ieee_id=8753743&rfr_iscdi=true