Loading…
PPred-PCKSM: A multi-layer predictor for identifying promoter and its variants using position based features
Promoter is a small region of DNA where a protein called RNA polymerase binds thus resulting in initiation of transcription of a specific gene. In bacteria with prokaryotic cell type, the sigma subunit that combines with RNA polymerase helps in identifying promoters. In Escherichia coli (E.coli), th...
Saved in:
Published in: | Computational biology and chemistry 2022-04, Vol.97, p.107623-107623, Article 107623 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Promoter is a small region of DNA where a protein called RNA polymerase binds thus resulting in initiation of transcription of a specific gene. In bacteria with prokaryotic cell type, the sigma subunit that combines with RNA polymerase helps in identifying promoters. In Escherichia coli (E.coli), the promoters are identified by different sigma factors consisting of different functionalities. There have been various methods used for prediction of different class of promoters. However, these methods need to be improved for better identification and classification of promoters. In this work, we propose a new multi-layer predictor named PPred-PCKSM that uses position-correlation based k-mer scoring matrix (PCKSM), a new feature extraction strategy and an artificial neural network (ANN) for predicting promoters and its six types, namely σ70, σ24, σ28, σ32, σ38 and σ54 in E.coli bacteria. We employ PCKSM technique to extract feature sets from different k-mers. The feature sets obtained from trimers and tetramers are concatenated and then passed through ANN for final prediction. The resultant feature set contained effective features that contributed towards achieving an accuracy of 98.02% and Matthews correlation coefficient (MCC) of 96.04% for promoter prediction task. Our model used 5-fold cross validation on the benchmark dataset and outperformed all the current state-of-art-methods used for prediction of promoters and its different types in E.coli bacteria.
[Display omitted]
•Proposed two layer based model for prediction of promoters and their variants.•An effective feature extraction technique PCKSM is used from different k-mers.•The concatenation of trimer and tetramer features yields more effective features.•PPred-PCKSM outperforms all the state-of-the-art models trained on E.coli dataset.•The model achieved an accuracy of 98.02, MCC of 96.04 for promoter prediction task. |
---|---|
ISSN: | 1476-9271 1476-928X |
DOI: | 10.1016/j.compbiolchem.2022.107623 |