Loading…

Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites

Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on emerging topics in computational intelligence 2021-06, Vol.5 (3), p.373-383
Main Authors: Ng, Wing W. Y., Zhang, Yuda, Zhang, Jianjun, Wang, Debby D., Wang, Fu Lee
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Prediction of protein-protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein-protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein-protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by \text{57.3}\%, \text{88.2}\%, and \text{78.2}\% out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.
ISSN:2471-285X
2471-285X
DOI:10.1109/TETCI.2019.2922340