Loading…
Modulation of DNA-protein Interactions by Proximal Genetic Elements as Uncovered by Interpretable Deep Learning
[Display omitted] •Presence of binding motif and chromatin accessibility does not guarantee TF-DNA interaction.•Language models can help extract trainable representations of biological sequences.•Together with sequence embeddings and an attention-based ML-architecture, reliable estimator of TF-DNA b...
Saved in:
Published in: | Journal of molecular biology 2023-07, Vol.435 (13), p.168121-168121, Article 168121 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | [Display omitted]
•Presence of binding motif and chromatin accessibility does not guarantee TF-DNA interaction.•Language models can help extract trainable representations of biological sequences.•Together with sequence embeddings and an attention-based ML-architecture, reliable estimator of TF-DNA binding is developed.•Model interpretability allows for identification and annotation of proximal genetic elements that may have a role in selectivity of TF-DNA interactions.
Transcription factors (TF) recognize specific motifs in the genome that are typically 6–12 bp long to regulate various aspects of the cellular machinery. Presence of binding motifs and favorable genome accessibility are key drivers for a consistent TF-DNA interaction. Although these pre-requisites may occur thousands of times in the genome, there seems to be a high degree of selectivity for the sites that are actually bound. Here, we present a deep-learning framework that identifies and characterizes the upstream and downstream genetic elements to the binding motif, for their role in enforcing the mentioned selectivity. The proposed framework is based on an interpretable recurrent neural network architecture that enables for the relative analysis of sequence context features. We apply the framework to model twenty-six transcription factors and score the TF-DNA binding at a base-pair resolution. We find significant differences in activations of DNA context features for bound and unbound sequences. In addition to standardized evaluation protocols, we offer outstanding interpretability that enables us to identify and annotate DNA sequence with possible elements that modulate TF-DNA binding. Also, differences in data processing have a huge influence on the overall model performance. Overall, the proposed framework allows for novel insights on the non-coding genetic elements and their role in facilitating a stable TF-DNA interaction. |
---|---|
ISSN: | 0022-2836 1089-8638 |
DOI: | 10.1016/j.jmb.2023.168121 |