Loading…
A novel ECOC algorithm for multiclass microarray data classification based on data complexity analysis
•We proposed a novel ECOC algorithm for multiclass microarray data classification based on the data complexity theory.•Various data complexity measures are deployed to detect the intrinsic characteristics of microarray data sets, so as to produce diverse coding matrices.•A new data complexity measur...
Saved in:
Published in: | Pattern recognition 2019-06, Vol.90, p.346-362 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •We proposed a novel ECOC algorithm for multiclass microarray data classification based on the data complexity theory.•Various data complexity measures are deployed to detect the intrinsic characteristics of microarray data sets, so as to produce diverse coding matrices.•A new data complexity measure, named as C1, is designed to evaluate data distribution. It benefits the optimization process of our class partition.•The proposed ECOC algorithm performs more stably in most multiclass microarray data sets compared with other popular ECOC algorithms.
Nowadays, a lot of new classification and clustering techniques have been proposed for microarray data analysis. However, the multiclass microarray data classification is still regarded as a tough task because of the small sample size problem and the class imbalance problem. In this paper, we propose a novel error correcting output code (ECOC) algorithm for the classification of multiclass microarray data based on the data complexity (DC) theory. In this algorithm, an ECOC coding matrix is generated based on a hierarchical partition of the class space with the aim of Minimizing Data Complexity (named as ECOC-MDC). As the partition process can be mapped as a binary tree, a compact ensemble with high discrimination power is produced. The performance of ECOC-MDC is compared with some state-of-art ECOC algorithms on six multiclass microarray data sets, and it is found that the proposed algorithm can obtain better results in most cases. The correlation between DC measures and the dichotomizers’ performances is checked, and the observations confirm that high complexity in data usually leads to high error rates of the connected dichotomizers. But the error correcting mechanism in the ECOC framework can effectively improve our algorithm's generalization ability. In short, ECOC-MDC can produce a compact ensemble system with high error correction capability through the application of diverse DC measures. Our Matlab code is available at: github.com/MLDMXM2017/ECOC-MDC. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2019.01.047 |