Loading…

HIBoost: A hubness-aware ensemble learning algorithm for high-dimensional imbalanced data classification

Learning from high-dimensional imbalanced data is prevalent in many vital real-world applications, which poses a severe challenge to traditional data mining and machine learning algorithms. The existing works generally use dimension reduction methods to deal with the curse of dimensionality, then ap...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent & fuzzy systems 2020-01, Vol.39 (1), p.133-144
Main Authors: Wu, Qin, Lin, Yaping, Zhu, Tuanfei, Zhang, Yue
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Learning from high-dimensional imbalanced data is prevalent in many vital real-world applications, which poses a severe challenge to traditional data mining and machine learning algorithms. The existing works generally use dimension reduction methods to deal with the curse of dimensionality, then apply traditional imbalance learning techniques to combat the problem of class imbalance. However, dimensionality reduction may cause the loss of useful information, especially for the minority classes. This paper introduces an ensemble-based method, HIBoost, to directly handle the imbalanced learning problem in high dimensional space. HIBoost takes into account the inherent high-dimensional hubness phenomenon, i.e., high-dimensional data tends to contain the singular points (hubs and anti-hubs) which frequently or rarely occur in k-nearest neighbors of other points. For the singular hubs and anti-hubs induced by high dimension, HIBoost introduces a discount factor to restrict the weight growth of them in the process of updating weight, so that the risk of over fitting can be reduced when training component classifiers. For class imbalance problem, HIBoost uses SMOTE to balance the training data in each iteration so as to alleviate the prediction bias of component classifiers. Experimental results based on sixteen high-dimensional imbalanced data sets demonstrate the effectiveness of HIBoost.
ISSN:1064-1246
1875-8967
DOI:10.3233/JIFS-190821