Loading…

Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements

Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but curr...

Full description

Saved in:
Bibliographic Details
Published in:Genome Biology 2015-01, Vol.16 (1), p.14-14, Article 14
Main Authors: Zhang, Weiwei, Spector, Tim D, Deloukas, Panos, Bell, Jordana T, Engelhardt, Barbara E
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c562t-a8769a06d0d5082858404f52775109aa2cabd7f94a17214ddd20a181ae4f63613
cites cdi_FETCH-LOGICAL-c562t-a8769a06d0d5082858404f52775109aa2cabd7f94a17214ddd20a181ae4f63613
container_end_page 14
container_issue 1
container_start_page 14
container_title Genome Biology
container_volume 16
creator Zhang, Weiwei
Spector, Tim D
Deloukas, Panos
Bell, Jordana T
Engelhardt, Barbara E
description Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.
doi_str_mv 10.1186/s13059-015-0581-9
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4389802</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1674960719</sourcerecordid><originalsourceid>FETCH-LOGICAL-c562t-a8769a06d0d5082858404f52775109aa2cabd7f94a17214ddd20a181ae4f63613</originalsourceid><addsrcrecordid>eNpdkUtv1TAQhS0EoqXwA9igSGxY1DDj-LlBqkp5SBWwAImd5cbOrUsS39oJ1f33ONxSFVa2xt85PqNDyHOE14havinYgjAUUFAQGql5QA6RK06VhB8P17sUVBqQB-RJKVcAaDiTj8kBExJly9khGb_m4GM3x2nTbMKUxkBvog_Nu88nzRjmy93g5pimZikrcX8yuvyzHO9FsWu2qcR1fty4yf-R57BZKpvyrglDGMM0l6fkUe-GEp7dnkfk-_uzb6cf6fmXD59OT85pJySbqdNKGgfSgxegmRaaA-8FU0ogGOdY5y686g13qBhy7z0Dhxpd4L1sJbZH5O3ed7tcjMF39e_sBrvNsabe2eSi_fdlipd2k35Z3mqjgVWDV7cGOV0vocx2jKULw-CmkJZiUSpuJCg0FX35H3qVljzV9SxjUBPrmrtSuKe6nErJob8Lg2DXMu2-TFvLtGuZdnV-cX-LO8Xf9trfJxibeA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2207518277</pqid></control><display><type>article</type><title>Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><source>PubMed Central</source><creator>Zhang, Weiwei ; Spector, Tim D ; Deloukas, Panos ; Bell, Jordana T ; Engelhardt, Barbara E</creator><creatorcontrib>Zhang, Weiwei ; Spector, Tim D ; Deloukas, Panos ; Bell, Jordana T ; Engelhardt, Barbara E</creatorcontrib><description>Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.</description><identifier>ISSN: 1465-6906</identifier><identifier>ISSN: 1474-7596</identifier><identifier>EISSN: 1474-760X</identifier><identifier>EISSN: 1465-6906</identifier><identifier>DOI: 10.1186/s13059-015-0581-9</identifier><identifier>PMID: 25616342</identifier><language>eng</language><publisher>England: BioMed Central</publisher><subject>Adult ; Aged ; Algorithms ; Binding sites ; Bioinformatics ; Brain ; Cell division ; Computer applications ; CpG islands ; CpG Islands - genetics ; Deoxyribonuclease ; Deoxyribonucleic acid ; DNA ; DNA methylation ; DNA Methylation - genetics ; Epigenetics ; Female ; Gene expression ; Genome, Human ; Genomes ; Genomics - methods ; Humans ; Localization ; Male ; Middle Aged ; Phenotypes ; Principal Component Analysis ; Regulatory sequences ; Regulatory Sequences, Nucleic Acid - genetics ; Sequence Analysis, DNA ; Studies ; Sulfites</subject><ispartof>Genome Biology, 2015-01, Vol.16 (1), p.14-14, Article 14</ispartof><rights>2015. This work is licensed under http://creativecommons.org/licenses/by/2.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Zhang et al.; licensee BioMed Central. 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c562t-a8769a06d0d5082858404f52775109aa2cabd7f94a17214ddd20a181ae4f63613</citedby><cites>FETCH-LOGICAL-c562t-a8769a06d0d5082858404f52775109aa2cabd7f94a17214ddd20a181ae4f63613</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4389802/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2207518277?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25616342$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Weiwei</creatorcontrib><creatorcontrib>Spector, Tim D</creatorcontrib><creatorcontrib>Deloukas, Panos</creatorcontrib><creatorcontrib>Bell, Jordana T</creatorcontrib><creatorcontrib>Engelhardt, Barbara E</creatorcontrib><title>Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements</title><title>Genome Biology</title><addtitle>Genome Biol</addtitle><description>Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.</description><subject>Adult</subject><subject>Aged</subject><subject>Algorithms</subject><subject>Binding sites</subject><subject>Bioinformatics</subject><subject>Brain</subject><subject>Cell division</subject><subject>Computer applications</subject><subject>CpG islands</subject><subject>CpG Islands - genetics</subject><subject>Deoxyribonuclease</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA methylation</subject><subject>DNA Methylation - genetics</subject><subject>Epigenetics</subject><subject>Female</subject><subject>Gene expression</subject><subject>Genome, Human</subject><subject>Genomes</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>Localization</subject><subject>Male</subject><subject>Middle Aged</subject><subject>Phenotypes</subject><subject>Principal Component Analysis</subject><subject>Regulatory sequences</subject><subject>Regulatory Sequences, Nucleic Acid - genetics</subject><subject>Sequence Analysis, DNA</subject><subject>Studies</subject><subject>Sulfites</subject><issn>1465-6906</issn><issn>1474-7596</issn><issn>1474-760X</issn><issn>1465-6906</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpdkUtv1TAQhS0EoqXwA9igSGxY1DDj-LlBqkp5SBWwAImd5cbOrUsS39oJ1f33ONxSFVa2xt85PqNDyHOE14havinYgjAUUFAQGql5QA6RK06VhB8P17sUVBqQB-RJKVcAaDiTj8kBExJly9khGb_m4GM3x2nTbMKUxkBvog_Nu88nzRjmy93g5pimZikrcX8yuvyzHO9FsWu2qcR1fty4yf-R57BZKpvyrglDGMM0l6fkUe-GEp7dnkfk-_uzb6cf6fmXD59OT85pJySbqdNKGgfSgxegmRaaA-8FU0ogGOdY5y686g13qBhy7z0Dhxpd4L1sJbZH5O3ed7tcjMF39e_sBrvNsabe2eSi_fdlipd2k35Z3mqjgVWDV7cGOV0vocx2jKULw-CmkJZiUSpuJCg0FX35H3qVljzV9SxjUBPrmrtSuKe6nErJob8Lg2DXMu2-TFvLtGuZdnV-cX-LO8Xf9trfJxibeA</recordid><startdate>20150124</startdate><enddate>20150124</enddate><creator>Zhang, Weiwei</creator><creator>Spector, Tim D</creator><creator>Deloukas, Panos</creator><creator>Bell, Jordana T</creator><creator>Engelhardt, Barbara E</creator><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20150124</creationdate><title>Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements</title><author>Zhang, Weiwei ; Spector, Tim D ; Deloukas, Panos ; Bell, Jordana T ; Engelhardt, Barbara E</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c562t-a8769a06d0d5082858404f52775109aa2cabd7f94a17214ddd20a181ae4f63613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Adult</topic><topic>Aged</topic><topic>Algorithms</topic><topic>Binding sites</topic><topic>Bioinformatics</topic><topic>Brain</topic><topic>Cell division</topic><topic>Computer applications</topic><topic>CpG islands</topic><topic>CpG Islands - genetics</topic><topic>Deoxyribonuclease</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA methylation</topic><topic>DNA Methylation - genetics</topic><topic>Epigenetics</topic><topic>Female</topic><topic>Gene expression</topic><topic>Genome, Human</topic><topic>Genomes</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>Localization</topic><topic>Male</topic><topic>Middle Aged</topic><topic>Phenotypes</topic><topic>Principal Component Analysis</topic><topic>Regulatory sequences</topic><topic>Regulatory Sequences, Nucleic Acid - genetics</topic><topic>Sequence Analysis, DNA</topic><topic>Studies</topic><topic>Sulfites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Weiwei</creatorcontrib><creatorcontrib>Spector, Tim D</creatorcontrib><creatorcontrib>Deloukas, Panos</creatorcontrib><creatorcontrib>Bell, Jordana T</creatorcontrib><creatorcontrib>Engelhardt, Barbara E</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>ProQuest Biological Science Journals</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genome Biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Weiwei</au><au>Spector, Tim D</au><au>Deloukas, Panos</au><au>Bell, Jordana T</au><au>Engelhardt, Barbara E</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements</atitle><jtitle>Genome Biology</jtitle><addtitle>Genome Biol</addtitle><date>2015-01-24</date><risdate>2015</risdate><volume>16</volume><issue>1</issue><spage>14</spage><epage>14</epage><pages>14-14</pages><artnum>14</artnum><issn>1465-6906</issn><issn>1474-7596</issn><eissn>1474-760X</eissn><eissn>1465-6906</eissn><abstract>Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.</abstract><cop>England</cop><pub>BioMed Central</pub><pmid>25616342</pmid><doi>10.1186/s13059-015-0581-9</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1465-6906
ispartof Genome Biology, 2015-01, Vol.16 (1), p.14-14, Article 14
issn 1465-6906
1474-7596
1474-760X
1465-6906
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4389802
source Publicly Available Content Database (Proquest) (PQ_SDU_P3); PubMed Central
subjects Adult
Aged
Algorithms
Binding sites
Bioinformatics
Brain
Cell division
Computer applications
CpG islands
CpG Islands - genetics
Deoxyribonuclease
Deoxyribonucleic acid
DNA
DNA methylation
DNA Methylation - genetics
Epigenetics
Female
Gene expression
Genome, Human
Genomes
Genomics - methods
Humans
Localization
Male
Middle Aged
Phenotypes
Principal Component Analysis
Regulatory sequences
Regulatory Sequences, Nucleic Acid - genetics
Sequence Analysis, DNA
Studies
Sulfites
title Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T14%3A08%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20genome-wide%20DNA%20methylation%20using%20methylation%20marks,%20genomic%20position,%20and%20DNA%20regulatory%20elements&rft.jtitle=Genome%20Biology&rft.au=Zhang,%20Weiwei&rft.date=2015-01-24&rft.volume=16&rft.issue=1&rft.spage=14&rft.epage=14&rft.pages=14-14&rft.artnum=14&rft.issn=1465-6906&rft.eissn=1474-760X&rft_id=info:doi/10.1186/s13059-015-0581-9&rft_dat=%3Cproquest_pubme%3E1674960719%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c562t-a8769a06d0d5082858404f52775109aa2cabd7f94a17214ddd20a181ae4f63613%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2207518277&rft_id=info:pmid/25616342&rfr_iscdi=true