Loading…

Stability selection enhances feature selection and enables accurate prediction of gestational age using only five DNA methylation sites

Background: DNA methylation (DNAm) is robustly associated with chronological age in children and adults, and gestational age (GA) in newborns. This property has enabled the development of several epigenetic clocks that can accurately predict chronological age and GA. However, the lack of overlap in...

Full description

Saved in:
Bibliographic Details
Main Authors: Haftorn, Kristine Løkås, Romanowska, Julia, Lee, Yunsung, Page, Christian Magnus, Magnus, Per Minor, Håberg, Siri Eldevik, Bohlin, Jon, Jugessur, Astanand, Denault, William Robert Paul
Format: Article
Language:English
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: DNA methylation (DNAm) is robustly associated with chronological age in children and adults, and gestational age (GA) in newborns. This property has enabled the development of several epigenetic clocks that can accurately predict chronological age and GA. However, the lack of overlap in predictive CpGs across different epigenetic clocks remains elusive. Our main aim was therefore to identify and characterize CpGs that are stably predictive of GA. Results: We applied a statistical approach called 'stability selection' to DNAm data from 2138 newborns in the Norwegian Mother, Father, and Child Cohort study. Stability selection combines subsampling with variable selection to restrict the number of false discoveries in the set of selected variables. Twenty-four CpGs were identified as being stably predictive of GA. Intriguingly, only up to 10% of the CpGs in previous GA clocks were found to be stably selected. Based on these results, we used generalized additive model regression to develop a new GA clock consisting of only five CpGs, which showed a similar predictive performance as previous GA clocks (R2 = 0.674, median absolute deviation = 4.4 days). These CpGs were in or near genes and regulatory regions involved in immune responses, metabolism, and developmental processes. Furthermore, accounting for nonlinear associations improved prediction performance in preterm newborns. Conclusion: We present a methodological framework for feature selection that is broadly applicable to any trait that can be predicted from DNAm data. We demonstrate its utility by identifying CpGs that are highly predictive of GA and present a new and highly performant GA clock based on only five CpGs that is more amenable to a clinical setting. Keywords: Cord blood; DNA methylation; Epigenetic clock; Epigenetics; Feature selection; Gestational age; Illumina MethylationEPIC BeadChip; MBRN; MoBa; Stability selection.