Loading…

Spectral analysis of two-signed microarray expression data

We give a simple and informative derivation of a spectral algorithm for clustering and reordering complementary DNA microarray expression data. Here, expression levels of a set of genes are recorded simultaneously across a number of samples, with a positive weight reflecting up-regulation and a nega...

Full description

Saved in:
Bibliographic Details
Published in:Mathematical medicine and biology 2007-06, Vol.24 (2), p.131-148
Main Authors: Higham, Desmond J., Kalna, Gabriela, Vass, J. Keith
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We give a simple and informative derivation of a spectral algorithm for clustering and reordering complementary DNA microarray expression data. Here, expression levels of a set of genes are recorded simultaneously across a number of samples, with a positive weight reflecting up-regulation and a negative weight reflecting down-regulation. We give theoretical support for the algorithm based on a biologically justified hypothesis about the structure of the data, and illustrate its use on public domain data in the context of unsupervised tumour classification. The algorithm is derived by considering a discrete optimization problem and then relaxing to the continuous realm. We prove that in the case where the data have an inherent ‘checkerboard’ sign pattern, the algorithm will automatically reveal that pattern. Further, our derivation shows that the algorithm may be regarded as imposing a random graph model on the expression levels and then clustering from a maximum likelihood perspective. This indicates that the output will be tolerant to perturbations and will reveal ‘near-checkerboard’ patterns when these are present in the data. It is interesting to note that the checkerboard structure is revealed by the first (dominant) singular vectors—previous work on spectral methods has focussed on the case of nonnegative edge weights, where only the second and higher singular vectors are relevant. We illustrate the algorithm on real and synthetic data, and then use it in a tumour classification context on three different cancer data sets. Our results show that respecting the two-signed nature of the data (thereby distinguishing between up-regulation and down-regulation) reveals structures that cannot be gleaned from the absolute value data (where up- and down-regulation are both regarded as ‘changes’).
ISSN:1477-8599
1477-8602
DOI:10.1093/imammb/dql030