Loading…

Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system

Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage,...

Full description

Saved in:
Bibliographic Details
Published in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-07, Vol.52 (9), p.10674-10691
Main Authors: Mao, Ting, Zhou, Li, Zhang, Yueyi, Sun, Yefang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage, important feature variables are determined by four principles, namely maximizing mutual information between features and classes, minimizing mutual information between features, maximizing the initial classification accuracy, and selecting features that produce not only the local maximum or minimum of the difference between the mean Mahalanobis distances (MDs) of normal and abnormal samples but also the largest number of features. At the threshold determination stage, using the selected features, particle swarm optimization is used to determine the optimal threshold for classifying normal and abnormal samples according to the principle of maximizing classification accuracy. At the classification and discrimination stage, the samples are divided into two classes according to their MDs and optimal threshold. Experimental results show that OMTS obtains 0.92, 0.95, 0.81, 0.88, and 0.74 in accuracy on the Forest Type Mapping UCI, Fetal Health Classification, Connectionist Bench, Wine Quality, and Oil datasets, respectively, and has better classification performance than other algorithms.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-021-02929-8