Loading…
HDD: a hypercube division-based algorithm for discretisation
Discretisation, as one of the basic data preparation techniques, has played an important role in data mining. This article introduces a new hypercube division-based (HDD) algorithm for supervised discretisation. The algorithm considers the distribution of both class and continuous attributes and the...
Saved in:
Published in: | International journal of systems science 2011-04, Vol.42 (4), p.557-566 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Discretisation, as one of the basic data preparation techniques, has played an important role in data mining. This article introduces a new hypercube division-based (HDD) algorithm for supervised discretisation. The algorithm considers the distribution of both class and continuous attributes and the underlying correlation structure in the data set. It tries to find a minimal set of cut points, which divides the continuous attribute space into a finite number of hypercubes, and the objects within each hypercube belong to the same decision class. Finally, tests are performed on seven mix-mode data sets, and the C5.0 algorithm is used to generate classification rules from the discretised data. Compared with the other three well-known discretisation algorithms, the HDD algorithm can generate a better discretisation scheme, which improves the accuracy of classification and reduces the number of classification rules. |
---|---|
ISSN: | 0020-7721 1464-5319 |
DOI: | 10.1080/00207720903572455 |