Loading…

HDD: a hypercube division-based algorithm for discretisation

Discretisation, as one of the basic data preparation techniques, has played an important role in data mining. This article introduces a new hypercube division-based (HDD) algorithm for supervised discretisation. The algorithm considers the distribution of both class and continuous attributes and the...

Full description

Saved in:
Bibliographic Details
Published in:International journal of systems science 2011-04, Vol.42 (4), p.557-566
Main Authors: Yang, Ping, Li, Ji-Sheng, Huang, Yong-Xuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Discretisation, as one of the basic data preparation techniques, has played an important role in data mining. This article introduces a new hypercube division-based (HDD) algorithm for supervised discretisation. The algorithm considers the distribution of both class and continuous attributes and the underlying correlation structure in the data set. It tries to find a minimal set of cut points, which divides the continuous attribute space into a finite number of hypercubes, and the objects within each hypercube belong to the same decision class. Finally, tests are performed on seven mix-mode data sets, and the C5.0 algorithm is used to generate classification rules from the discretised data. Compared with the other three well-known discretisation algorithms, the HDD algorithm can generate a better discretisation scheme, which improves the accuracy of classification and reduces the number of classification rules.
ISSN:0020-7721
1464-5319
DOI:10.1080/00207720903572455