Loading…

Prediction of protein–protein interactions based on elastic net and deep forest

•A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via l...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2021-08, Vol.176, p.114876, Article 114876
Main Authors: Yu, Bin, Chen, Cheng, Wang, Xiaolin, Yu, Zhaomin, Ma, Anjun, Liu, Bingqiang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via layer-by-layer processing of raw features.•GcForest-PPI model has good generalization ability on cross-species datasets and PPIs network. Prediction of protein–protein interactions (PPIs) helps to grasp molecular roots of disease. However, web-lab experiments to predict PPIs are limited and costly. Using machine-learning-based frameworks can not only automatically identify PPIs, but also provide new ideas for drug research and development from a promising alternative. We present a novel deep-forest-based method for PPIs prediction. Firstly, pseudo amino acid composition (PAAC), autocorrelation descriptor (Auto), multivariate mutual information (MMI), composition-transition-distribution (CTD), amino acid composition position-specific scoring matrix (AAC-PSSM), and dipeptide composition PSSM (DPC-PSSM) are adopted to extract and construct the pattern of PPIs. Secondly, elastic net is utilized to optimize the initial feature vectors and boost the predictive performance. Finally, we ensemble XGBoost, random forest, and extremely randomized trees to construct deep forest model via cascade architecture for PPIs prediction (GcForest-PPI). Benchmark experiments reveal that the proposed approach outperforms other state-of-the-art predictors on Saccharomyces cerevisiae and Helicobacter pylori. We also apply GcForest-PPI on independent test sets, CD9-core network, crossover network, and cancer-specific network. The evaluation shows that GcForest-PPI can boost the prediction accuracy, complement experiments and improve drug discovery.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.114876