Loading…
Res-GCN: Identification of protein phosphorylation sites using graph convolutional network and residual network
An essential post-translational modification, phosphorylation is intimately related with a wide range of biological activities. The advancement of effective computational methods for correctly recognizing phosphorylation sites is important for in-depth understanding of various physiological phenomen...
Saved in:
Published in: | Computational biology and chemistry 2024-10, Vol.112, p.108183, Article 108183 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | An essential post-translational modification, phosphorylation is intimately related with a wide range of biological activities. The advancement of effective computational methods for correctly recognizing phosphorylation sites is important for in-depth understanding of various physiological phenomena. However, the traditional method of identifying phosphorylation sites experimentally is time-consuming and laborious, which makes it difficult to meet the processing demands of today's big data. This research proposes the use of a novel model, Res-GCN, to recognize the phosphorylation sites of SARS-CoV-2. Firstly, eight feature extraction strategies are utilized to digitize the protein sequence from multiple viewpoints, including amino acid property encodings (AAindex), pseudo-amino acid composition (PseAAC), adapted normal distribution bi-profile Bayes (ANBPB), dipeptide composition (DC), binary encoding (BE), enhanced amino acid composition (EAAC), Word2Vec, and BLOSUM62 matrices. Secondly, elastic net is utilized to eliminate redundant data in the fused matrix. Finally, a combination of graph convolutional network (GCN) and residual network (ResNet) is used to classify the phosphorylated sites and output predictions using a fully connected layer (FC). The performance of Res-GCN is tested by 5-fold cross-validation and independent testing, and excellent results are obtained on S/T and Y datasets. This demonstrates that the Res-GCN model exhibits exceptional predictive performance and generalizability.
[Display omitted]
•A novel method (Res-GCN) to predict protein phosphorylation sites.•The AAindex, PseAAC, ANBPB, DC, BE, EAAC, Word2Vec, and BLOSUM62 matrices are fused to extract protein sequence features.•The Elastic Net is used to screen optimal feature subset for the first time.•We firstly combine graph convolutional network and residual network to predict the phosphorylation sites.•Res-GCN improves prediction performance compared to existing models. |
---|---|
ISSN: | 1476-9271 1476-928X 1476-928X |
DOI: | 10.1016/j.compbiolchem.2024.108183 |