Loading…
Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites
Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a...
Saved in:
Published in: | IEEE transactions on emerging topics in computational intelligence 2021-06, Vol.5 (3), p.373-383 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963 |
---|---|
cites | cdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963 |
container_end_page | 383 |
container_issue | 3 |
container_start_page | 373 |
container_title | IEEE transactions on emerging topics in computational intelligence |
container_volume | 5 |
creator | Ng, Wing W. Y. Zhang, Yuda Zhang, Jianjun Wang, Debby D. Wang, Fu Lee |
description | Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by \text{57.3}\%, \text{88.2}\%, and \text{78.2}\% out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future. |
doi_str_mv | 10.1109/TETCI.2019.2922340 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2532300988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8753743</ieee_id><sourcerecordid>2532300988</sourcerecordid><originalsourceid>FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</originalsourceid><addsrcrecordid>eNpNkFtLAzEQhYMoWGr_gL4EfN6a23aTRy1eCgWFVvAtZLOzNbVNahKF_nu3bhGf5sCcM3P4ELqkZEwpUTfL--V0NmaEqjFTjHFBTtCAiYoWTJZvp__0ORqltCaEMFVSXooB-ljkYN9Nys7iBfjksvt2eY-XEQDfhdAt_Aq3IeLZtjYb4y00-CVC42x2wXcy1BvYJhzag87gfDF3K-MbPPMZoultC5chXaCz1mwSjI5ziF4fuupPxfz5cTa9nReWsTIXDWdNa1ohLAFVCVZLwWVVttJSYoFbLpUUIEpi1URxIwnhdFLzphEVqbma8CG67u_uYvj8gpT1OnxF373UrOSME6Kk7Fysd9kYUorQ6l10WxP3mhJ94Kp_ueoDV33k2oWu-pADgL9A145XgvMfPAJ0kQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2532300988</pqid></control><display><type>article</type><title>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Ng, Wing W. Y. ; Zhang, Yuda ; Zhang, Jianjun ; Wang, Debby D. ; Wang, Fu Lee</creator><creatorcontrib>Ng, Wing W. Y. ; Zhang, Yuda ; Zhang, Jianjun ; Wang, Debby D. ; Wang, Fu Lee</creatorcontrib><description><![CDATA[Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by <inline-formula><tex-math notation="LaTeX">\text{57.3}\%</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">\text{88.2}\%</tex-math></inline-formula>, and <inline-formula><tex-math notation="LaTeX">\text{78.2}\%</tex-math></inline-formula> out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.]]></description><identifier>ISSN: 2471-285X</identifier><identifier>EISSN: 2471-285X</identifier><identifier>DOI: 10.1109/TETCI.2019.2922340</identifier><identifier>CODEN: ITETCU</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Amino acids ; Binding ; Boosting ; Datasets ; decision tree ; imbalanced learning problem ; Machine learning ; Performance enhancement ; Perturbation methods ; Protein-ligand interaction sites ; Proteins ; Residues ; Sensitivity ; stochastic sensitivity measure ; Support vector machines ; Training</subject><ispartof>IEEE transactions on emerging topics in computational intelligence, 2021-06, Vol.5 (3), p.373-383</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</citedby><cites>FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</cites><orcidid>0000-0002-3976-0053 ; 0000-0002-3755-8943 ; 0000-0001-9133-4994 ; 0000-0003-0783-3585</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8753743$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Ng, Wing W. Y.</creatorcontrib><creatorcontrib>Zhang, Yuda</creatorcontrib><creatorcontrib>Zhang, Jianjun</creatorcontrib><creatorcontrib>Wang, Debby D.</creatorcontrib><creatorcontrib>Wang, Fu Lee</creatorcontrib><title>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</title><title>IEEE transactions on emerging topics in computational intelligence</title><addtitle>TETCI</addtitle><description><![CDATA[Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by <inline-formula><tex-math notation="LaTeX">\text{57.3}\%</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">\text{88.2}\%</tex-math></inline-formula>, and <inline-formula><tex-math notation="LaTeX">\text{78.2}\%</tex-math></inline-formula> out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.]]></description><subject>Algorithms</subject><subject>Amino acids</subject><subject>Binding</subject><subject>Boosting</subject><subject>Datasets</subject><subject>decision tree</subject><subject>imbalanced learning problem</subject><subject>Machine learning</subject><subject>Performance enhancement</subject><subject>Perturbation methods</subject><subject>Protein-ligand interaction sites</subject><subject>Proteins</subject><subject>Residues</subject><subject>Sensitivity</subject><subject>stochastic sensitivity measure</subject><subject>Support vector machines</subject><subject>Training</subject><issn>2471-285X</issn><issn>2471-285X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNkFtLAzEQhYMoWGr_gL4EfN6a23aTRy1eCgWFVvAtZLOzNbVNahKF_nu3bhGf5sCcM3P4ELqkZEwpUTfL--V0NmaEqjFTjHFBTtCAiYoWTJZvp__0ORqltCaEMFVSXooB-ljkYN9Nys7iBfjksvt2eY-XEQDfhdAt_Aq3IeLZtjYb4y00-CVC42x2wXcy1BvYJhzag87gfDF3K-MbPPMZoultC5chXaCz1mwSjI5ziF4fuupPxfz5cTa9nReWsTIXDWdNa1ohLAFVCVZLwWVVttJSYoFbLpUUIEpi1URxIwnhdFLzphEVqbma8CG67u_uYvj8gpT1OnxF373UrOSME6Kk7Fysd9kYUorQ6l10WxP3mhJ94Kp_ueoDV33k2oWu-pADgL9A145XgvMfPAJ0kQ</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Ng, Wing W. Y.</creator><creator>Zhang, Yuda</creator><creator>Zhang, Jianjun</creator><creator>Wang, Debby D.</creator><creator>Wang, Fu Lee</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-3976-0053</orcidid><orcidid>https://orcid.org/0000-0002-3755-8943</orcidid><orcidid>https://orcid.org/0000-0001-9133-4994</orcidid><orcidid>https://orcid.org/0000-0003-0783-3585</orcidid></search><sort><creationdate>20210601</creationdate><title>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</title><author>Ng, Wing W. Y. ; Zhang, Yuda ; Zhang, Jianjun ; Wang, Debby D. ; Wang, Fu Lee</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Amino acids</topic><topic>Binding</topic><topic>Boosting</topic><topic>Datasets</topic><topic>decision tree</topic><topic>imbalanced learning problem</topic><topic>Machine learning</topic><topic>Performance enhancement</topic><topic>Perturbation methods</topic><topic>Protein-ligand interaction sites</topic><topic>Proteins</topic><topic>Residues</topic><topic>Sensitivity</topic><topic>stochastic sensitivity measure</topic><topic>Support vector machines</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ng, Wing W. Y.</creatorcontrib><creatorcontrib>Zhang, Yuda</creatorcontrib><creatorcontrib>Zhang, Jianjun</creatorcontrib><creatorcontrib>Wang, Debby D.</creatorcontrib><creatorcontrib>Wang, Fu Lee</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on emerging topics in computational intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ng, Wing W. Y.</au><au>Zhang, Yuda</au><au>Zhang, Jianjun</au><au>Wang, Debby D.</au><au>Wang, Fu Lee</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites</atitle><jtitle>IEEE transactions on emerging topics in computational intelligence</jtitle><stitle>TETCI</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>5</volume><issue>3</issue><spage>373</spage><epage>383</epage><pages>373-383</pages><issn>2471-285X</issn><eissn>2471-285X</eissn><coden>ITETCU</coden><abstract><![CDATA[Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by <inline-formula><tex-math notation="LaTeX">\text{57.3}\%</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">\text{88.2}\%</tex-math></inline-formula>, and <inline-formula><tex-math notation="LaTeX">\text{78.2}\%</tex-math></inline-formula> out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TETCI.2019.2922340</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-3976-0053</orcidid><orcidid>https://orcid.org/0000-0002-3755-8943</orcidid><orcidid>https://orcid.org/0000-0001-9133-4994</orcidid><orcidid>https://orcid.org/0000-0003-0783-3585</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2471-285X |
ispartof | IEEE transactions on emerging topics in computational intelligence, 2021-06, Vol.5 (3), p.373-383 |
issn | 2471-285X 2471-285X |
language | eng |
recordid | cdi_proquest_journals_2532300988 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Algorithms Amino acids Binding Boosting Datasets decision tree imbalanced learning problem Machine learning Performance enhancement Perturbation methods Protein-ligand interaction sites Proteins Residues Sensitivity stochastic sensitivity measure Support vector machines Training |
title | Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T16%3A36%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Stochastic%20Sensitivity%20Tree%20Boosting%20for%20Imbalanced%20Prediction%20Problems%20of%20Protein-Ligand%20Interaction%20Sites&rft.jtitle=IEEE%20transactions%20on%20emerging%20topics%20in%20computational%20intelligence&rft.au=Ng,%20Wing%20W.%20Y.&rft.date=2021-06-01&rft.volume=5&rft.issue=3&rft.spage=373&rft.epage=383&rft.pages=373-383&rft.issn=2471-285X&rft.eissn=2471-285X&rft.coden=ITETCU&rft_id=info:doi/10.1109/TETCI.2019.2922340&rft_dat=%3Cproquest_ieee_%3E2532300988%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c225t-d32dfaf44c0e9742b843875f8c10ce3c38984e450c9693a800316b3dd470b3963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2532300988&rft_id=info:pmid/&rft_ieee_id=8753743&rfr_iscdi=true |