Loading…

Exploring variable-length features (motifs) for predicting binding sites through interpretable deep neural networks

Transcription factor binding sites (TFBS) and RNA-binding proteins (RBP) plays a key role in gene regulation, transcription, RNA editing. Identifying and locating these potential sites is essential for detecting pathogenic variation in many biological processes. Some portions of binding sites are re...

Full description

Saved in:
Bibliographic Details
Published in:Engineering applications of artificial intelligence 2021-11, Vol.106, p.104485, Article 104485
Main Authors: Dasari, Chandra Mohan, Amilpur, Santhosh, Bhukya, Raju
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Transcription factor binding sites (TFBS) and RNA-binding proteins (RBP) plays a key role in gene regulation, transcription, RNA editing. Identifying and locating these potential sites is essential for detecting pathogenic variation in many biological processes. Some portions of binding sites are recognized by biological experiments that are time-intensive and expensive. Many computational approaches are considered as possible alternative solutions and few deep learning methods are recently developed for fast and accurate prediction of binding sites. Although existing approaches achieve competent performance, many methods requires specialized feature set and moreover interpretability remains challenging. To overcome these problems, we propose an interpretable deep learning technique called protein binding variable pattern predictor (PBVPP), which uses a wide variety of experimental data and performance metrics to predict binding sites. The novelty of our proposed method is based on three key factors: (i) PBVPP along with its variant has the capability to extract vital features from large-scale genomic sequences obtained by high throughput technology to predict the occurrence of TFBS and RBP sites. (ii) The proposed interpretable model reveals how to mine vital features, and also extract variable length patterns for accurate prediction of binding sites. (iii) The obtained motifs are validated against the TFBSshape DNA (JASPAR) database’s known target motifs. The proposed model has shown an improvement of 5.88%, 5.01% over state-of-the-art methods in terms of receiver operating curve for TFBS, RBP and also shown tremendous improvement of 60% in precision recall curve for TFBS prediction. •Proposed a generic interpretable DNN model for accurate TFBS and RBP sites prediction.•PBVPP-Hybrid outperforms all the state-of-the-art models on 625 ChIP-Seq datasets.•PBVPP outperforms all the existing methods on RBP-24 datasets.•Proposed model derives the most relevant patterns that induce TFBS and RBP sites.•Extracted variable length patterns are validated with known JASPAR TFBSshape database.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2021.104485