Loading…

Comparison of pre-processing methods for Infinium HumanMethylation450 BeadChip array

Microarrays are widely used to quantify DNA methylation because they are economical, require only small quantities of input DNA and focus on well-characterized regions of the genome. However, pre-processing of methylation microarray data is challenging because of confounding factors that include bac...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2017-10, Vol.33 (20), p.3151-3157
Main Authors: Shiah, Yu-Jia, Fraser, Michael, Bristow, Robert G, Boutros, Paul C
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Microarrays are widely used to quantify DNA methylation because they are economical, require only small quantities of input DNA and focus on well-characterized regions of the genome. However, pre-processing of methylation microarray data is challenging because of confounding factors that include background fluorescence, dye bias and the impact of germline polymorphisms. Therefore, we present valuable insights and a framework for those seeking the most optimal pre-processing method through a data-driven approach. Here, we show that Dasen is the optimal pre-processing methodology for the Infinium HumanMethylation450 BeadChip array in prostate cancer, a frequently employed platform for tumour methylome profiling in both the TCGA and ICGC consortia. We evaluated the impact of 11 pre-processing methods on batch effects, replicate variabilities, sensitivities and sample-to-sample correlations across 809 independent prostate cancer samples, including 150 reported for the first time in this study. Overall, Dasen is the most effective for removing artefacts and detecting biological differences associated with tumour aggressivity. Relative to the raw dataset, it shows a reduction in replicate variances of 67% and 76% for β- and M-values, respectively. Our study provides a unique pre-processing benchmark for the community with an emphasis on biological implications. All software used in this study are publicly available as detailed in the article. paul.boutros@oicr.on.ca. Supplementary data are available at Bioinformatics online.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btx372