Loading…
Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning
The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis‐regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced nat...
Saved in:
Published in: | Advanced science 2024-10, Vol.11 (39), p.e2407013-n/a |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis‐regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language techniques such as Transformers, which has been very effective in modeling complex protein sequence and structures. Here 3UTRBERT is described, which implements an attention‐based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT is pre‐trained on aggregated 3'UTR sequences of human mRNAs in a task‐agnostic manner; the pre‐trained model is then fine‐tuned for specific downstream tasks such as identifying RBP binding sites, m6A RNA modification sites, and predicting RNA sub‐cellular localizations. Benchmark results show that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. More importantly, the self‐attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements and effectively identifies regions with important regulatory potential. It is expected that 3UTRBERT model can serve as the foundational tool to analyze various sequence labeling tasks within the 3'UTR fields, thus enhancing the decipherability of post‐transcriptional regulatory mechanisms.
The 3UTRBERT language model uses Transformer architectures to analyze 3'UTR regions of mRNAs. Pre‐trained on human 3'UTR sequences, it is fine‐tuned to identify RBP binding sites, m6A RNA modifications, and RNA sub‐cellular localizations. 3UTRBERT outperforms other methods, highlighting key functional regions and enhancing the understanding of post‐transcriptional regulatory mechanisms. |
---|---|
ISSN: | 2198-3844 2198-3844 |
DOI: | 10.1002/advs.202407013 |