Loading…

Prediction of protein–protein interactions based on elastic net and deep forest

•A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via l...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2021-08, Vol.176, p.114876, Article 114876
Main Authors: Yu, Bin, Chen, Cheng, Wang, Xiaolin, Yu, Zhaomin, Ma, Anjun, Liu, Bingqiang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c438t-f0624533a2d58cf788c99c138f5491b28e7f892d43914788035e3cc72e9780833
cites cdi_FETCH-LOGICAL-c438t-f0624533a2d58cf788c99c138f5491b28e7f892d43914788035e3cc72e9780833
container_end_page
container_issue
container_start_page 114876
container_title Expert systems with applications
container_volume 176
creator Yu, Bin
Chen, Cheng
Wang, Xiaolin
Yu, Zhaomin
Ma, Anjun
Liu, Bingqiang
description •A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via layer-by-layer processing of raw features.•GcForest-PPI model has good generalization ability on cross-species datasets and PPIs network. Prediction of protein–protein interactions (PPIs) helps to grasp molecular roots of disease. However, web-lab experiments to predict PPIs are limited and costly. Using machine-learning-based frameworks can not only automatically identify PPIs, but also provide new ideas for drug research and development from a promising alternative. We present a novel deep-forest-based method for PPIs prediction. Firstly, pseudo amino acid composition (PAAC), autocorrelation descriptor (Auto), multivariate mutual information (MMI), composition-transition-distribution (CTD), amino acid composition position-specific scoring matrix (AAC-PSSM), and dipeptide composition PSSM (DPC-PSSM) are adopted to extract and construct the pattern of PPIs. Secondly, elastic net is utilized to optimize the initial feature vectors and boost the predictive performance. Finally, we ensemble XGBoost, random forest, and extremely randomized trees to construct deep forest model via cascade architecture for PPIs prediction (GcForest-PPI). Benchmark experiments reveal that the proposed approach outperforms other state-of-the-art predictors on Saccharomyces cerevisiae and Helicobacter pylori. We also apply GcForest-PPI on independent test sets, CD9-core network, crossover network, and cancer-specific network. The evaluation shows that GcForest-PPI can boost the prediction accuracy, complement experiments and improve drug discovery.
doi_str_mv 10.1016/j.eswa.2021.114876
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2543515136</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417421003171</els_id><sourcerecordid>2543515136</sourcerecordid><originalsourceid>FETCH-LOGICAL-c438t-f0624533a2d58cf788c99c138f5491b28e7f892d43914788035e3cc72e9780833</originalsourceid><addsrcrecordid>eNp9kE1KBDEQhYMoOI5ewFXAdbf57aTBjQz-wYAKug6ZdAXSjN1jklHceQdv6EnM2K5dVUG9V-_xIXRKSU0Jbc77GtK7rRlhtKZUaNXsoRnVileNavk-mpFWqkpQJQ7RUUo9IVQRombo8SFCF1wO44BHjzdxzBCG78-vvw2HIUO0v4KEVzZBh4sU1jbl4PAAGduhwx3ABvsxQsrH6MDbdYKTvzlHz9dXT4vbanl_c7e4XFZOcJ0rTxomJOeWdVI7r7R2beso116Klq6YBuV1yzrBWyrKlXAJ3DnFoFWaaM7n6Gz6W5q-bkuw6cdtHEqkYVJwSSXlTVGxSeXimFIEbzYxvNj4YSgxO3SmNzt0ZofOTOiK6WIyQen_FiCa5AIMrpCK4LLpxvCf_QctIndX</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2543515136</pqid></control><display><type>article</type><title>Prediction of protein–protein interactions based on elastic net and deep forest</title><source>ScienceDirect Freedom Collection</source><creator>Yu, Bin ; Chen, Cheng ; Wang, Xiaolin ; Yu, Zhaomin ; Ma, Anjun ; Liu, Bingqiang</creator><creatorcontrib>Yu, Bin ; Chen, Cheng ; Wang, Xiaolin ; Yu, Zhaomin ; Ma, Anjun ; Liu, Bingqiang</creatorcontrib><description>•A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via layer-by-layer processing of raw features.•GcForest-PPI model has good generalization ability on cross-species datasets and PPIs network. Prediction of protein–protein interactions (PPIs) helps to grasp molecular roots of disease. However, web-lab experiments to predict PPIs are limited and costly. Using machine-learning-based frameworks can not only automatically identify PPIs, but also provide new ideas for drug research and development from a promising alternative. We present a novel deep-forest-based method for PPIs prediction. Firstly, pseudo amino acid composition (PAAC), autocorrelation descriptor (Auto), multivariate mutual information (MMI), composition-transition-distribution (CTD), amino acid composition position-specific scoring matrix (AAC-PSSM), and dipeptide composition PSSM (DPC-PSSM) are adopted to extract and construct the pattern of PPIs. Secondly, elastic net is utilized to optimize the initial feature vectors and boost the predictive performance. Finally, we ensemble XGBoost, random forest, and extremely randomized trees to construct deep forest model via cascade architecture for PPIs prediction (GcForest-PPI). Benchmark experiments reveal that the proposed approach outperforms other state-of-the-art predictors on Saccharomyces cerevisiae and Helicobacter pylori. We also apply GcForest-PPI on independent test sets, CD9-core network, crossover network, and cancer-specific network. The evaluation shows that GcForest-PPI can boost the prediction accuracy, complement experiments and improve drug discovery.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2021.114876</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Amino acids ; Composition ; Decision trees ; Deep forest ; Elastic net ; Experiments ; Machine learning ; Multi-information fusion ; Performance prediction ; Protein-protein interactions ; Proteins ; R&amp;D ; Research &amp; development</subject><ispartof>Expert systems with applications, 2021-08, Vol.176, p.114876, Article 114876</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Aug 15, 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c438t-f0624533a2d58cf788c99c138f5491b28e7f892d43914788035e3cc72e9780833</citedby><cites>FETCH-LOGICAL-c438t-f0624533a2d58cf788c99c138f5491b28e7f892d43914788035e3cc72e9780833</cites><orcidid>0000-0002-5734-1135 ; 0000-0001-8785-8058 ; 0000-0002-7310-7963 ; 0000-0002-4354-5508 ; 0000-0002-2453-7852 ; 0000-0001-6269-398X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,27905,27906</link.rule.ids></links><search><creatorcontrib>Yu, Bin</creatorcontrib><creatorcontrib>Chen, Cheng</creatorcontrib><creatorcontrib>Wang, Xiaolin</creatorcontrib><creatorcontrib>Yu, Zhaomin</creatorcontrib><creatorcontrib>Ma, Anjun</creatorcontrib><creatorcontrib>Liu, Bingqiang</creatorcontrib><title>Prediction of protein–protein interactions based on elastic net and deep forest</title><title>Expert systems with applications</title><description>•A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via layer-by-layer processing of raw features.•GcForest-PPI model has good generalization ability on cross-species datasets and PPIs network. Prediction of protein–protein interactions (PPIs) helps to grasp molecular roots of disease. However, web-lab experiments to predict PPIs are limited and costly. Using machine-learning-based frameworks can not only automatically identify PPIs, but also provide new ideas for drug research and development from a promising alternative. We present a novel deep-forest-based method for PPIs prediction. Firstly, pseudo amino acid composition (PAAC), autocorrelation descriptor (Auto), multivariate mutual information (MMI), composition-transition-distribution (CTD), amino acid composition position-specific scoring matrix (AAC-PSSM), and dipeptide composition PSSM (DPC-PSSM) are adopted to extract and construct the pattern of PPIs. Secondly, elastic net is utilized to optimize the initial feature vectors and boost the predictive performance. Finally, we ensemble XGBoost, random forest, and extremely randomized trees to construct deep forest model via cascade architecture for PPIs prediction (GcForest-PPI). Benchmark experiments reveal that the proposed approach outperforms other state-of-the-art predictors on Saccharomyces cerevisiae and Helicobacter pylori. We also apply GcForest-PPI on independent test sets, CD9-core network, crossover network, and cancer-specific network. The evaluation shows that GcForest-PPI can boost the prediction accuracy, complement experiments and improve drug discovery.</description><subject>Amino acids</subject><subject>Composition</subject><subject>Decision trees</subject><subject>Deep forest</subject><subject>Elastic net</subject><subject>Experiments</subject><subject>Machine learning</subject><subject>Multi-information fusion</subject><subject>Performance prediction</subject><subject>Protein-protein interactions</subject><subject>Proteins</subject><subject>R&amp;D</subject><subject>Research &amp; development</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1KBDEQhYMoOI5ewFXAdbf57aTBjQz-wYAKug6ZdAXSjN1jklHceQdv6EnM2K5dVUG9V-_xIXRKSU0Jbc77GtK7rRlhtKZUaNXsoRnVileNavk-mpFWqkpQJQ7RUUo9IVQRombo8SFCF1wO44BHjzdxzBCG78-vvw2HIUO0v4KEVzZBh4sU1jbl4PAAGduhwx3ABvsxQsrH6MDbdYKTvzlHz9dXT4vbanl_c7e4XFZOcJ0rTxomJOeWdVI7r7R2beso116Klq6YBuV1yzrBWyrKlXAJ3DnFoFWaaM7n6Gz6W5q-bkuw6cdtHEqkYVJwSSXlTVGxSeXimFIEbzYxvNj4YSgxO3SmNzt0ZofOTOiK6WIyQen_FiCa5AIMrpCK4LLpxvCf_QctIndX</recordid><startdate>20210815</startdate><enddate>20210815</enddate><creator>Yu, Bin</creator><creator>Chen, Cheng</creator><creator>Wang, Xiaolin</creator><creator>Yu, Zhaomin</creator><creator>Ma, Anjun</creator><creator>Liu, Bingqiang</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5734-1135</orcidid><orcidid>https://orcid.org/0000-0001-8785-8058</orcidid><orcidid>https://orcid.org/0000-0002-7310-7963</orcidid><orcidid>https://orcid.org/0000-0002-4354-5508</orcidid><orcidid>https://orcid.org/0000-0002-2453-7852</orcidid><orcidid>https://orcid.org/0000-0001-6269-398X</orcidid></search><sort><creationdate>20210815</creationdate><title>Prediction of protein–protein interactions based on elastic net and deep forest</title><author>Yu, Bin ; Chen, Cheng ; Wang, Xiaolin ; Yu, Zhaomin ; Ma, Anjun ; Liu, Bingqiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c438t-f0624533a2d58cf788c99c138f5491b28e7f892d43914788035e3cc72e9780833</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Amino acids</topic><topic>Composition</topic><topic>Decision trees</topic><topic>Deep forest</topic><topic>Elastic net</topic><topic>Experiments</topic><topic>Machine learning</topic><topic>Multi-information fusion</topic><topic>Performance prediction</topic><topic>Protein-protein interactions</topic><topic>Proteins</topic><topic>R&amp;D</topic><topic>Research &amp; development</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yu, Bin</creatorcontrib><creatorcontrib>Chen, Cheng</creatorcontrib><creatorcontrib>Wang, Xiaolin</creatorcontrib><creatorcontrib>Yu, Zhaomin</creatorcontrib><creatorcontrib>Ma, Anjun</creatorcontrib><creatorcontrib>Liu, Bingqiang</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yu, Bin</au><au>Chen, Cheng</au><au>Wang, Xiaolin</au><au>Yu, Zhaomin</au><au>Ma, Anjun</au><au>Liu, Bingqiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prediction of protein–protein interactions based on elastic net and deep forest</atitle><jtitle>Expert systems with applications</jtitle><date>2021-08-15</date><risdate>2021</risdate><volume>176</volume><spage>114876</spage><pages>114876-</pages><artnum>114876</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A novel method (GcForest-PPI) to predict protein–protein interactions.•The PseAAC, AD, MMI, CTD, AAC-PSSM and DPC-PSSM are fused to extract feature information.•The elastic net is employed to eliminate redundant and irrelevant features.•We firstly use deep forest as classifier to predict PPIs via layer-by-layer processing of raw features.•GcForest-PPI model has good generalization ability on cross-species datasets and PPIs network. Prediction of protein–protein interactions (PPIs) helps to grasp molecular roots of disease. However, web-lab experiments to predict PPIs are limited and costly. Using machine-learning-based frameworks can not only automatically identify PPIs, but also provide new ideas for drug research and development from a promising alternative. We present a novel deep-forest-based method for PPIs prediction. Firstly, pseudo amino acid composition (PAAC), autocorrelation descriptor (Auto), multivariate mutual information (MMI), composition-transition-distribution (CTD), amino acid composition position-specific scoring matrix (AAC-PSSM), and dipeptide composition PSSM (DPC-PSSM) are adopted to extract and construct the pattern of PPIs. Secondly, elastic net is utilized to optimize the initial feature vectors and boost the predictive performance. Finally, we ensemble XGBoost, random forest, and extremely randomized trees to construct deep forest model via cascade architecture for PPIs prediction (GcForest-PPI). Benchmark experiments reveal that the proposed approach outperforms other state-of-the-art predictors on Saccharomyces cerevisiae and Helicobacter pylori. We also apply GcForest-PPI on independent test sets, CD9-core network, crossover network, and cancer-specific network. The evaluation shows that GcForest-PPI can boost the prediction accuracy, complement experiments and improve drug discovery.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2021.114876</doi><orcidid>https://orcid.org/0000-0002-5734-1135</orcidid><orcidid>https://orcid.org/0000-0001-8785-8058</orcidid><orcidid>https://orcid.org/0000-0002-7310-7963</orcidid><orcidid>https://orcid.org/0000-0002-4354-5508</orcidid><orcidid>https://orcid.org/0000-0002-2453-7852</orcidid><orcidid>https://orcid.org/0000-0001-6269-398X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2021-08, Vol.176, p.114876, Article 114876
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_journals_2543515136
source ScienceDirect Freedom Collection
subjects Amino acids
Composition
Decision trees
Deep forest
Elastic net
Experiments
Machine learning
Multi-information fusion
Performance prediction
Protein-protein interactions
Proteins
R&D
Research & development
title Prediction of protein–protein interactions based on elastic net and deep forest
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T17%3A15%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prediction%20of%20protein%E2%80%93protein%20interactions%20based%20on%20elastic%20net%20and%20deep%20forest&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Yu,%20Bin&rft.date=2021-08-15&rft.volume=176&rft.spage=114876&rft.pages=114876-&rft.artnum=114876&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2021.114876&rft_dat=%3Cproquest_cross%3E2543515136%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c438t-f0624533a2d58cf788c99c138f5491b28e7f892d43914788035e3cc72e9780833%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2543515136&rft_id=info:pmid/&rfr_iscdi=true