Loading…

An efficient data reduction method and its application to cluster analysis

Data reduction plays a very important role in the data mining field, but the existing methods have not been able to efficiently identify all major features which are hidden in the large datasets. On some occasions, they even cause the loss of the original key features. In this paper, a new efficient...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2017-05, Vol.238, p.234-244
Main Authors: Wang, Jianpei, Yue, Shihong, Yu, Xiao, Wang, Yaru
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data reduction plays a very important role in the data mining field, but the existing methods have not been able to efficiently identify all major features which are hidden in the large datasets. On some occasions, they even cause the loss of the original key features. In this paper, a new efficient measure was developed to reduce a given dataset and to uncover the major features by multiplying the defined absolute density with the defined local density of any data. These two kinds of densities were estimated with a fast grid-based bisecting method. To test its performance on feature reduction and sample reduction, a group of feature-different synthetic datasets and 24 benchmark datasets were used as examples and the clustering accuracy, runtime and separability among clusters were used as measurements. The results strongly proved the proposed method could fast reduce a dataset and identify the most important key features. Additionally, it also can effectively determine the optimal number of clusters by suppressing the noisy data and enhancing the separation among clusters.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2017.01.059