Loading…
Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles
DNA-binding proteins perform important roles in cellular processes and are involved in many biological activities. These proteins include crucial protein-DNA binding domains and can interact with single-stranded or double-stranded DNA, and accordingly classified as single-stranded DNA-binding protei...
Saved in:
Published in: | Analytical biochemistry 2021-01, Vol.612, p.113954-113954, Article 113954 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | DNA-binding proteins perform important roles in cellular processes and are involved in many biological activities. These proteins include crucial protein-DNA binding domains and can interact with single-stranded or double-stranded DNA, and accordingly classified as single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Computational prediction of SSBs and DSBs helps in annotating protein functions and understanding of protein-binding domains.
Performance is reported using the DNA-binding protein dataset that was recently introduced by Wang et al., [1]. The proposed method achieved a sensitivity of 0.600, specificity of 0.792, AUC of 0.758, MCC of 0.369, accuracy of 0.744, and F-measure of 0.536, on the independent test set.
The proposed method with the hidden Markov model (HMM) profiles for feature extraction, outperformed the benchmark method in the literature and achieved an overall improvement of approximately 3%. The source code and supplementary information of the proposed method is available at https://github.com/roneshsharma/Predict-DNA-binding-proteins/wiki.
HMM profiles generated using HHblits is used to compute the features of DNA-binding proteins. The normalized profile-monogram and normalized profile-bigram based feature extraction techniques are used to compute the features. For classification, support vector machine (SVM), k-nearest neighbors (KNN) and random forest (RF) classifiers are used. The proposed approach achieved promising results compared to the benchmarked method in the literature. [Display omitted]
•Hidden Markov model (HMM) profiles for prediction of SSBs and DSBs.•Computational prediction of SSBs and DSBs helps in annotating protein functions.•Normalized profile-monogram and normalized profile-bigram based feature extraction techniques. |
---|---|
ISSN: | 0003-2697 1096-0309 |
DOI: | 10.1016/j.ab.2020.113954 |