Loading…

Res-GCN: Identification of protein phosphorylation sites using graph convolutional network and residual network

An essential post-translational modification, phosphorylation is intimately related with a wide range of biological activities. The advancement of effective computational methods for correctly recognizing phosphorylation sites is important for in-depth understanding of various physiological phenomen...

Full description

Saved in:
Bibliographic Details
Published in:Computational biology and chemistry 2024-10, Vol.112, p.108183, Article 108183
Main Authors: Wang, Minghui, Jia, Jihua, Xu, Fei, Zhou, Hongyan, Liu, Yushuang, Yu, Bin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An essential post-translational modification, phosphorylation is intimately related with a wide range of biological activities. The advancement of effective computational methods for correctly recognizing phosphorylation sites is important for in-depth understanding of various physiological phenomena. However, the traditional method of identifying phosphorylation sites experimentally is time-consuming and laborious, which makes it difficult to meet the processing demands of today's big data. This research proposes the use of a novel model, Res-GCN, to recognize the phosphorylation sites of SARS-CoV-2. Firstly, eight feature extraction strategies are utilized to digitize the protein sequence from multiple viewpoints, including amino acid property encodings (AAindex), pseudo-amino acid composition (PseAAC), adapted normal distribution bi-profile Bayes (ANBPB), dipeptide composition (DC), binary encoding (BE), enhanced amino acid composition (EAAC), Word2Vec, and BLOSUM62 matrices. Secondly, elastic net is utilized to eliminate redundant data in the fused matrix. Finally, a combination of graph convolutional network (GCN) and residual network (ResNet) is used to classify the phosphorylated sites and output predictions using a fully connected layer (FC). The performance of Res-GCN is tested by 5-fold cross-validation and independent testing, and excellent results are obtained on S/T and Y datasets. This demonstrates that the Res-GCN model exhibits exceptional predictive performance and generalizability. [Display omitted] •A novel method (Res-GCN) to predict protein phosphorylation sites.•The AAindex, PseAAC, ANBPB, DC, BE, EAAC, Word2Vec, and BLOSUM62 matrices are fused to extract protein sequence features.•The Elastic Net is used to screen optimal feature subset for the first time.•We firstly combine graph convolutional network and residual network to predict the phosphorylation sites.•Res-GCN improves prediction performance compared to existing models.
ISSN:1476-9271
1476-928X
1476-928X
DOI:10.1016/j.compbiolchem.2024.108183