Loading…
Gene-pair representation and incorporation of GO-based semantic similarity into classification of gene expression data
In this work, a novel data representation for learning from gene expression data is introduced, which is aimed at emphasizing gene-gene interactions in learning. With this representation, the data simply comprise differences in the expression values of gene pairs and not the expression values themse...
Saved in:
Published in: | Intelligent data analysis 2012-01, Vol.16 (5), p.827-843 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this work, a novel data representation for learning from gene expression data is introduced, which is aimed at emphasizing gene-gene interactions in learning. With this representation, the data simply comprise differences in the expression values of gene pairs and not the expression values themselves. An important benefit of this representation, except the better sensitivity to gene interactions, is the opportunity to incorporate external knowledge in the form of semantic similarity corresponding to the pairs, which is also studied. In this context, two common learning algorithms, plain k-NN classification and Random Forest are compared with two distance function learning-based techniques, learning from equivalence constraints and the intrinsic Random Forest similarity on a set of genetic benchmark datasets. The most discriminative gene pairs are selected and the new representation is evaluated on the benchmark data. The novel representation is shown to increase classification accuracy for genetic datasets. Exploiting the gene-pair representation and the Gene Ontology (GO), the semantic similarity of gene pairs is calculated and used to pre-select pairs with a high similarity value. The GO-based feature selection approach is compared to the common feature selection and is shown to often increase the classification accuracy. |
---|---|
ISSN: | 1088-467X 1571-4128 |
DOI: | 10.3233/IDA-2012-0553 |