Loading…

Deep5hmC: predicting genome-wide 5-hydroxymethylcytosine landscape via a multimodal deep learning model

Abstract Motivation 5-Hydroxymethylcytosine (5hmC), a crucial epigenetic mark with a significant role in regulating tissue-specific gene expression, is essential for understanding the dynamic functions of the human genome. Despite its importance, predicting 5hmC modification across the genome remain...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2024-09, Vol.40 (9)
Main Authors: Ma, Xin, Thela, Sai Ritesh, Zhao, Fengdi, Yao, Bing, Wen, Zhexing, Jin, Peng, Zhao, Jinying, Chen, Li
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Motivation 5-Hydroxymethylcytosine (5hmC), a crucial epigenetic mark with a significant role in regulating tissue-specific gene expression, is essential for understanding the dynamic functions of the human genome. Despite its importance, predicting 5hmC modification across the genome remains a challenging task, especially when considering the complex interplay between DNA sequences and various epigenetic factors such as histone modifications and chromatin accessibility. Results Using tissue-specific 5hmC sequencing data, we introduce Deep5hmC, a multimodal deep learning framework that integrates both the DNA sequence and epigenetic features such as histone modification and chromatin accessibility to predict genome-wide 5hmC modification. The multimodal design of Deep5hmC demonstrates remarkable improvement in predicting both qualitative and quantitative 5hmC modification compared to unimodal versions of Deep5hmC and state-of-the-art machine learning methods. This improvement is demonstrated through benchmarking on a comprehensive set of 5hmC sequencing data collected at four developmental stages during forebrain organoid development and across 17 human tissues. Compared to DeepSEA and random forest, Deep5hmC achieves close to 4% and 17% improvement of Area Under the Receiver Operating Characteristic (AUROC) across four forebrain developmental stages, and 6% and 27% across 17 human tissues for predicting binary 5hmC modification sites; and 8% and 22% improvement of Spearman correlation coefficient across four forebrain developmental stages, and 17% and 30% across 17 human tissues for predicting continuous 5hmC modification. Notably, Deep5hmC showcases its practical utility by accurately predicting gene expression and identifying differentially hydroxymethylated regions (DhMRs) in a case–control study of Alzheimer’s disease (AD). Deep5hmC significantly improves our understanding of tissue-specific gene regulation and facilitates the development of new biomarkers for complex diseases. Availability and implementation Deep5hmC is available via https://github.com/lichen-lab/Deep5hmC
ISSN:1367-4811
1367-4803
1367-4811
DOI:10.1093/bioinformatics/btae528