Loading…

A novel deep learning identifier for promoters and their strength using heterogeneous features

•We designed an intelligent two-layer deep learning (DL) model that predict the promoter region in the first stage and their functional types in second stage respectively.•We captured the DNA encoded patterns using Word2Vec algorithm and analyzed the visual impact of biological features using t-dist...

Full description

Saved in:
Bibliographic Details
Published in:Methods (San Diego, Calif.) Calif.), 2024-10, Vol.230, p.119-128
Main Authors: Amjad, Aqsa, Ahmed, Saeed, Kabir, Muhammad, Arif, Muhammad, Alam, Tanvir
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We designed an intelligent two-layer deep learning (DL) model that predict the promoter region in the first stage and their functional types in second stage respectively.•We captured the DNA encoded patterns using Word2Vec algorithm and analyzed the visual impact of biological features using t-distributed stochastic neighbor embedding method.•We enhanced the overall prediction performance of promoters and their functional types (as weak and strong prompters) on multiple datasets. Promoters, which are short (50–1500 base-pair) in DNA regions, have emerged to play a critical role in the regulation of gene transcription. Numerous dangerous diseases, likewise cancer, cardiovascular, and inflammatory bowel diseases, are caused by genetic variations in promoters. Consequently, the correct identification and characterization of promoters are significant for the discovery of drugs. However, experimental approaches to recognizing promoters and their strengths are challenging in terms of cost, time, and resources. Therefore, computational techniques are highly desirable for the correct characterization of promoters from unannotated genomic data. Here, we designed a powerful bi-layer deep-learning based predictor named “PROCABLES“, which discriminates DNA samples as promoters in the first-phase and strong or weak promoters in the second-phase respectively. The proposed method utilizes five distinct features, such as word2vec, k-spaced nucleotide pairs, trinucleotide propensity-based features, trinucleotide composition, and electron–ion interaction pseudopotentials, to extract the hidden patterns from the DNA sequence. Afterwards, a stacked framework is formed by integrating a convolutional neural network (CNN) with bidirectional long-short-term memory (LSTM) using multi-view attributes to train the proposed model. The PROCABLES model achieved an accuracy of 0.971 and 0.920 and the MCC 0.940 and 0.840 for the first and second-layer using the ten-fold cross-validation test, respectively. The predicted results anticipate that the proposed PROCABLES protocol outperformed the advanced computational predictors targeting promoters and their types. In summary, this research will provide useful hints for the recognition of large-scale promoters in particular and other DNA problems in general.
ISSN:1046-2023
1095-9130
1095-9130
DOI:10.1016/j.ymeth.2024.08.005