Loading…
ANOVA‐particle swarm optimization‐based feature selection and gradient boosting machine classifier for improved protein–protein interaction prediction
Feature fusion and selection strategies have been applied to improve accuracy in the prediction of protein–protein interaction (PPI). In this paper, an embedded feature selection framework is developed by integrating a cost function based on analysis of variance (ANOVA) with the particle swarm optim...
Saved in:
Published in: | Proteins, structure, function, and bioinformatics structure, function, and bioinformatics, 2022-02, Vol.90 (2), p.443-454 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Feature fusion and selection strategies have been applied to improve accuracy in the prediction of protein–protein interaction (PPI). In this paper, an embedded feature selection framework is developed by integrating a cost function based on analysis of variance (ANOVA) with the particle swarm optimization (PSO), termed AVPSO. Initially, the features of the protein sequences extracted using pseudo‐amino acid composition (PseAAC), conjoint triad composition, and local descriptor are fused. Then, AVPSO is employed to select the optimal set of features. The light gradient boosting machine (LGBM) classifier is used to predict the PPIs using the optimal feature subset. On the five‐fold cross‐validation analysis, the proposed model (AVPSO‐LGBM) achieved an average accuracy of 97.12% and 95.09%, respectively, on the intraspecies PPI datasets Saccharomyces cerevisiae and Helicobacter pylori. On the interspecies, PPI datasets of the Human‐Bacillus and Human‐Yersinia, an average accuracy of 95.20% and 93.44%, are achieved. Results obtained on independent test datasets, and network datasets show that the prediction accuracy of the AVPSO‐LGBM is better than the existing methods, demonstrating its generalization ability. The improved prediction performance obtained by the proposed model makes it a reliable and effective PPI prediction model. |
---|---|
ISSN: | 0887-3585 1097-0134 |
DOI: | 10.1002/prot.26236 |