Loading…

Integration of clinical and gene expression data has a synergetic effect on predicting breast cancer outcome

Breast cancer outcome can be predicted using models derived from gene expression data or clinical data. Only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction perf...

Full description

Saved in:
Bibliographic Details
Published in:PloS one 2012-07, Vol.7 (7), p.e40358
Main Authors: van Vliet, Martin H, Horlings, Hugo M, van de Vijver, Marc J, Reinders, Marcel J T, Wessels, Lodewyk F A
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Breast cancer outcome can be predicted using models derived from gene expression data or clinical data. Only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. We rigorously compare three different integration strategies (early, intermediate, and late integration) as well as classifiers employing no integration (only one data type) using five classifiers of varying complexity. We perform our analysis on a set of 295 breast cancer samples, for which gene expression data and an extensive set of clinical parameters are available as well as four breast cancer datasets containing 521 samples that we used as independent validation.mOn the 295 samples, a nearest mean classifier employing a logical OR operation (late integration) on clinical and expression classifiers significantly outperforms all other classifiers. Moreover, regardless of the integration strategy, the nearest mean classifier achieves the best performance. All five classifiers achieve their best performance when integrating clinical and expression data. Repeating the experiments using the 521 samples from the four independent validation datasets also indicated a significant performance improvement when integrating clinical and gene expression data. Whether integration also improves performances on other datasets (e.g. other tumor types) has not been investigated, but seems worthwhile pursuing. Our work suggests that future models for predicting breast cancer outcome should exploit both data types by employing a late OR or intermediate integration strategy based on nearest mean classifiers.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0040358