Loading…

DPP-PseAAC: A DNA-binding protein prediction model using Chou’s general PseAAC

•We have presented DPP-PseAAC, a new computational model to identify DNA-binding proteins in an efficient and accurate way.•Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information.•We have employed Random Fo...

Full description

Saved in:
Bibliographic Details
Published in:Journal of theoretical biology 2018-09, Vol.452, p.22-34
Main Authors: Rahman, M. Saifur, Shatabda, Swakkhar, Saha, Sanjay, Kaykobad, M., Rahman, M. Sohel
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We have presented DPP-PseAAC, a new computational model to identify DNA-binding proteins in an efficient and accurate way.•Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information.•We have employed Random Forest (RF) model to rank and identify the most important features.•We have further used Recursive Feature Elimination (RFE) method to extract an optimal set of features and trained a prediction model using Support Vector Machine (SVM) with linear kernel.•DPP-PseAAC demonstrates superior performance compared to the state-of-the-art predictors on standard benchmark dataset. A DNA-binding protein (DNA-BP) is a protein that can bind and interact with a DNA. Identification of DNA-BPs using experimental methods is expensive as well as time consuming. As such, fast and accurate computational methods are sought for predicting whether a protein can bind with a DNA or not. In this paper, we focus on building a new computational model to identify DNA-BPs in an efficient and accurate way. Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information. After feature extraction, we have employed Random Forest (RF) model to rank the features. Afterwards, we have used Recursive Feature Elimination (RFE) method to extract an optimal set of features and trained a prediction model using Support Vector Machine (SVM) with linear kernel. Our proposed method, named as DNA-binding Protein Prediction model using Chou’s general PseAAC (DPP-PseAAC), demonstrates superior performance compared to the state-of-the-art predictors on standard benchmark dataset. DPP-PseAAC achieves accuracy values of 93.21%, 95.91% and 77.42% for 10-fold cross-validation test, jackknife test and independent test respectively. The source code of DPP-PseAAC, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/DNABinding. A publicly accessible web interface has also been established at: http://77.68.43.135:8080/DPP-PseAAC/.
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2018.05.006