Loading…
ES-ARCNN: Predicting enhancer strength by using data augmentation and residual convolutional neural network
Enhancers are non-coding DNA sequences bound by proteins called transcription factors. They function as distant regulators of gene transcription and participate in the development and maintenance of cell types and tissues. Since experimental validation of enhancers is expensive and time-consuming, m...
Saved in:
Published in: | Analytical biochemistry 2021-04, Vol.618, p.114120-114120, Article 114120 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Enhancers are non-coding DNA sequences bound by proteins called transcription factors. They function as distant regulators of gene transcription and participate in the development and maintenance of cell types and tissues. Since experimental validation of enhancers is expensive and time-consuming, many computational methods have been developed to predict enhancers and their strength. However, most of these methods still lack good performance in the prediction of enhancer strength. Here, we present a method to predict Enhancers Strength (i.e., strong and weak) by using Augmented data and Residual Convolutional Neural Network (ES-ARCNN). To train ES-ARCNN, we used two data augmentation tricks (i.e., reverse complement and shift) to previously identified enhancers for enlarging a previously identified dataset of enhancers. We further employed a residual convolutional neural network and trained it using the augmented dataset. Compared with other state-of-the-art methods in the 10-fold cross-validation (CV) test, ES-ARCNN has the best performance with the accuracy of 66.17%, and the tricks of data augmentation can effectively improve the prediction performance. We further tested ES-ARCNN on an independent dataset and obtained 65.5% accuracy, which has more than 4% improvement over the other three existing methods. The results in 10CV and independent tests show that ES-ARCNN can effectively predict the enhancer strength. The transcription factor binding sites (TFBSs) enrichment analysis shows that from the mechanistic perspective, enhancer strength is associated with a higher density of important TFBSs in a tissue. A user-friendly web-application is also provided at http://compgenomics.utsa.edu/ES-ARCNN/.
[Display omitted]
•Enhancer is important for increasing transcription of specific genes via interaction with transcription factors. Identify enhancer and its strength can guide biological experiments, and help to find the gene regulatory mechanisms.•By introducing some novel tricks of data augmentation to enlarge the training data, and adopting the residual convolutional neural network to ease the training of networks, we proposed ES-ARCNN method to predict the enhancer strength.•To our knowledge, we first present the data augmentation trick of shifting on the mapped genomic sequence of the previously identified enhancers to enlarge the training data for effectively improve the prediction accuracy.•Compared with other existing state-of-the-art methods, our ES-A |
---|---|
ISSN: | 0003-2697 1096-0309 |
DOI: | 10.1016/j.ab.2021.114120 |