Loading…
EDLm6APred: ensemble deep learning approach for mRNA m6A site prediction
As a common and abundant RNA methylation modification, N6-methyladenosine (m.sup.6A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m.sup.6A methylation sit...
Saved in:
Published in: | BMC bioinformatics 2021-05, Vol.22 (1), p.1-288, Article 288 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As a common and abundant RNA methylation modification, N6-methyladenosine (m.sup.6A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m.sup.6A methylation sites has become a hot topic. Most biological methods rely on high-throughput sequencing technology, which places great demands on the sequencing library preparation and data analysis. Thus, various machine learning methods have been proposed to extract various types of features based on sequences, then occupied conventional classifiers, such as SVM, RF, etc., for m.sup.6A methylation site identification. However, the identification performance relies heavily on the extracted features, which still need to be improved. This paper mainly studies feature extraction and classification of m.sup.6A methylation sites in a natural language processing way, which manages to organically integrate the feature extraction and classification simultaneously, with consideration of upstream and downstream information of m.sup.6A sites. One-hot, RNA word embedding, and Word2vec are adopted to depict sites from the perspectives of the base as well as its upstream and downstream sequence. The BiLSTM model, a well-known sequence model, was then constructed to discriminate the sequences with potential m.sup.6A sites. Since the above-mentioned three feature extraction methods focus on different perspectives of m.sup.6A sites, an ensemble deep learning predictor (EDLm.sup.6APred) was finally constructed for m.sup.6A site prediction. Experimental results on human and mouse data sets show that EDLm.sup.6APred outperforms the other single ones, indicating that base, upstream, and downstream information are all essential for m.sup.6A site detection. Compared with the existing m.sup.6A methylation site prediction models without genomic features, EDLm.sup.6APred obtains 86.6% of the area under receiver operating curve on the human data sets, indicating the effectiveness of sequential modeling on RNA. To maximize user convenience, a webserver was developed as an implementation of EDLm.sup.6APred and made publicly available at www.xjtlu.edu.cn/biologicalsciences/EDLm6APred. Our proposed EDLm.sup.6APred method is a reliable predictor for m.sup.6A methylation sites. |
---|---|
ISSN: | 1471-2105 1471-2105 |
DOI: | 10.1186/s12859-021-04206-4 |