Loading…

Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study

Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-me...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics 2015-07, Vol.31 (13), p.2115-2122
Main Authors: Hendrickx, Diana M, Jennen, Danyel G J, Briedé, Jacob J, Cavill, Rachel, de Kok, Theo M, Kleinjans, Jos C S
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic endpoints (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data. None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods. The Matlab code for the simulations performed in this research is available in the Supplementary Data (Word file). The microarray data analysed in this paper are available at ArrayExpress (E-TOXM-22 and E-TOXM-23) and Gene Expression Omnibus (GSE39291). The phenotypic data are available in the Supplementary Data (Excel file). Links to the pattern recognition tools compared in this paper are provided in the main text. d.hendrickx@maastrichtuniversity.nl Supplementary data are available at Bioinformatics online.
ISSN:1367-4803
1367-4811
1460-2059
DOI:10.1093/bioinformatics/btv108