Loading…

Supervised Discretization with GK − τ

When data are high dimensional and mix-typed while response variable is categorical, an effective executable profile consists of categorical or categorized variables with easily understandable statistics. Many data mining technologies require categor- ical variables; many have better results by chan...

Full description

Saved in:
Bibliographic Details
Published in:Procedia computer science 2013, Vol.17, p.114-120
Main Authors: Huang, Wenxue, Pan, Yuanyi, Wu, Jianhong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:When data are high dimensional and mix-typed while response variable is categorical, an effective executable profile consists of categorical or categorized variables with easily understandable statistics. Many data mining technologies require categor- ical variables; many have better results by changing continuous variables to categorical variables. Discretizing a continuous variable can be accomplished in either a supervised way or an unsupervised or conventional way. We propose a supervised discretizing method using the Goodman-Kruskal tau (or GK-τ) maximization as the discretization optimization criterion. This optimization is probabilistic averaging effect oriented. An experiment with financial loan application is designed to show the improvement after the discretization. Some technical concerns during the discretization are discussed in this article as well.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2013.05.016