Loading…
Prediction of protein–protein interactions based on elastic net and deep forest
•A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via l...
Saved in:
Published in: | Expert systems with applications 2021-08, Vol.176, p.114876, Article 114876 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via layer-by-layer processing of raw features.•GcForest-PPI model has good generalization ability on cross-species datasets and PPIs network.
Prediction of protein–protein interactions (PPIs) helps to grasp molecular roots of disease. However, web-lab experiments to predict PPIs are limited and costly. Using machine-learning-based frameworks can not only automatically identify PPIs, but also provide new ideas for drug research and development from a promising alternative. We present a novel deep-forest-based method for PPIs prediction. Firstly, pseudo amino acid composition (PAAC), autocorrelation descriptor (Auto), multivariate mutual information (MMI), composition-transition-distribution (CTD), amino acid composition position-specific scoring matrix (AAC-PSSM), and dipeptide composition PSSM (DPC-PSSM) are adopted to extract and construct the pattern of PPIs. Secondly, elastic net is utilized to optimize the initial feature vectors and boost the predictive performance. Finally, we ensemble XGBoost, random forest, and extremely randomized trees to construct deep forest model via cascade architecture for PPIs prediction (GcForest-PPI). Benchmark experiments reveal that the proposed approach outperforms other state-of-the-art predictors on Saccharomyces cerevisiae and Helicobacter pylori. We also apply GcForest-PPI on independent test sets, CD9-core network, crossover network, and cancer-specific network. The evaluation shows that GcForest-PPI can boost the prediction accuracy, complement experiments and improve drug discovery. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2021.114876 |