Loading…

Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system

Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage,...

Full description

Saved in:
Bibliographic Details
Published in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-07, Vol.52 (9), p.10674-10691
Main Authors: Mao, Ting, Zhou, Li, Zhang, Yueyi, Sun, Yefang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c385t-11893e4f2ffa2eb00e87a39f2610402c5c762ea31fb8ced17a0af39910619d403
cites cdi_FETCH-LOGICAL-c385t-11893e4f2ffa2eb00e87a39f2610402c5c762ea31fb8ced17a0af39910619d403
container_end_page 10691
container_issue 9
container_start_page 10674
container_title Applied intelligence (Dordrecht, Netherlands)
container_volume 52
creator Mao, Ting
Zhou, Li
Zhang, Yueyi
Sun, Yefang
description Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage, important feature variables are determined by four principles, namely maximizing mutual information between features and classes, minimizing mutual information between features, maximizing the initial classification accuracy, and selecting features that produce not only the local maximum or minimum of the difference between the mean Mahalanobis distances (MDs) of normal and abnormal samples but also the largest number of features. At the threshold determination stage, using the selected features, particle swarm optimization is used to determine the optimal threshold for classifying normal and abnormal samples according to the principle of maximizing classification accuracy. At the classification and discrimination stage, the samples are divided into two classes according to their MDs and optimal threshold. Experimental results show that OMTS obtains 0.92, 0.95, 0.81, 0.88, and 0.74 in accuracy on the Forest Type Mapping UCI, Fetal Health Classification, Connectionist Bench, Wine Quality, and Oil datasets, respectively, and has better classification performance than other algorithms.
doi_str_mv 10.1007/s10489-021-02929-8
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2678581734</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2678581734</sourcerecordid><originalsourceid>FETCH-LOGICAL-c385t-11893e4f2ffa2eb00e87a39f2610402c5c762ea31fb8ced17a0af39910619d403</originalsourceid><addsrcrecordid>eNp9kE1PwzAMhiMEEuPjD3CqxLngJG2THNHElzTEZUjcgpslW6Z2GUl3GL-ejCJx42DZlp_XTl5CrijcUABxmyhUUpXAaA7FVCmPyITWgpeiUuKYTECxqmwa9X5KzlJaAwDnQCfkY9phSt55g4MPmwK7ZYh-WPWFC7Ewh2Hh-xY73Bi7KBY4YNFiymWGw3bwvf_KzQuuDkhofSrnuNyZlS_SPg22vyAnDrtkL3_zOXl7uJ9Pn8rZ6-Pz9G5WGi7roaRUKm4rx5xDZlsAKwVy5ViTfwbM1EY0zCKnrpX5IVQgoONKUWioWlTAz8n1uHcbw-fOpkGvwy5u8knNGiFrSQWvMsVGysSQUrROb6PvMe41BX1wUo9O6uyk_nFSyyzioyhleLO08W_1P6pvZKx3Yg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2678581734</pqid></control><display><type>article</type><title>Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system</title><source>ABI/INFORM Global</source><source>Springer Nature</source><creator>Mao, Ting ; Zhou, Li ; Zhang, Yueyi ; Sun, Yefang</creator><creatorcontrib>Mao, Ting ; Zhou, Li ; Zhang, Yueyi ; Sun, Yefang</creatorcontrib><description>Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage, important feature variables are determined by four principles, namely maximizing mutual information between features and classes, minimizing mutual information between features, maximizing the initial classification accuracy, and selecting features that produce not only the local maximum or minimum of the difference between the mean Mahalanobis distances (MDs) of normal and abnormal samples but also the largest number of features. At the threshold determination stage, using the selected features, particle swarm optimization is used to determine the optimal threshold for classifying normal and abnormal samples according to the principle of maximizing classification accuracy. At the classification and discrimination stage, the samples are divided into two classes according to their MDs and optimal threshold. Experimental results show that OMTS obtains 0.92, 0.95, 0.81, 0.88, and 0.74 in accuracy on the Forest Type Mapping UCI, Fetal Health Classification, Connectionist Bench, Wine Quality, and Oil datasets, respectively, and has better classification performance than other algorithms.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-021-02929-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Classification ; Computer Science ; Data mining ; Machine learning ; Machines ; Manufacturing ; Maximization ; Mechanical Engineering ; Particle swarm optimization ; Principles ; Processes ; Taguchi methods</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2022-07, Vol.52 (9), p.10674-10691</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c385t-11893e4f2ffa2eb00e87a39f2610402c5c762ea31fb8ced17a0af39910619d403</citedby><cites>FETCH-LOGICAL-c385t-11893e4f2ffa2eb00e87a39f2610402c5c762ea31fb8ced17a0af39910619d403</cites><orcidid>0000-0002-8627-2124</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2678581734/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2678581734?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>Mao, Ting</creatorcontrib><creatorcontrib>Zhou, Li</creatorcontrib><creatorcontrib>Zhang, Yueyi</creatorcontrib><creatorcontrib>Sun, Yefang</creatorcontrib><title>Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage, important feature variables are determined by four principles, namely maximizing mutual information between features and classes, minimizing mutual information between features, maximizing the initial classification accuracy, and selecting features that produce not only the local maximum or minimum of the difference between the mean Mahalanobis distances (MDs) of normal and abnormal samples but also the largest number of features. At the threshold determination stage, using the selected features, particle swarm optimization is used to determine the optimal threshold for classifying normal and abnormal samples according to the principle of maximizing classification accuracy. At the classification and discrimination stage, the samples are divided into two classes according to their MDs and optimal threshold. Experimental results show that OMTS obtains 0.92, 0.95, 0.81, 0.88, and 0.74 in accuracy on the Forest Type Mapping UCI, Fetal Health Classification, Connectionist Bench, Wine Quality, and Oil datasets, respectively, and has better classification performance than other algorithms.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Machine learning</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Maximization</subject><subject>Mechanical Engineering</subject><subject>Particle swarm optimization</subject><subject>Principles</subject><subject>Processes</subject><subject>Taguchi methods</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kE1PwzAMhiMEEuPjD3CqxLngJG2THNHElzTEZUjcgpslW6Z2GUl3GL-ejCJx42DZlp_XTl5CrijcUABxmyhUUpXAaA7FVCmPyITWgpeiUuKYTECxqmwa9X5KzlJaAwDnQCfkY9phSt55g4MPmwK7ZYh-WPWFC7Ewh2Hh-xY73Bi7KBY4YNFiymWGw3bwvf_KzQuuDkhofSrnuNyZlS_SPg22vyAnDrtkL3_zOXl7uJ9Pn8rZ6-Pz9G5WGi7roaRUKm4rx5xDZlsAKwVy5ViTfwbM1EY0zCKnrpX5IVQgoONKUWioWlTAz8n1uHcbw-fOpkGvwy5u8knNGiFrSQWvMsVGysSQUrROb6PvMe41BX1wUo9O6uyk_nFSyyzioyhleLO08W_1P6pvZKx3Yg</recordid><startdate>20220701</startdate><enddate>20220701</enddate><creator>Mao, Ting</creator><creator>Zhou, Li</creator><creator>Zhang, Yueyi</creator><creator>Sun, Yefang</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-8627-2124</orcidid></search><sort><creationdate>20220701</creationdate><title>Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system</title><author>Mao, Ting ; Zhou, Li ; Zhang, Yueyi ; Sun, Yefang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c385t-11893e4f2ffa2eb00e87a39f2610402c5c762ea31fb8ced17a0af39910619d403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Machine learning</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Maximization</topic><topic>Mechanical Engineering</topic><topic>Particle swarm optimization</topic><topic>Principles</topic><topic>Processes</topic><topic>Taguchi methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mao, Ting</creatorcontrib><creatorcontrib>Zhou, Li</creatorcontrib><creatorcontrib>Zhang, Yueyi</creatorcontrib><creatorcontrib>Sun, Yefang</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mao, Ting</au><au>Zhou, Li</au><au>Zhang, Yueyi</au><au>Sun, Yefang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2022-07-01</date><risdate>2022</risdate><volume>52</volume><issue>9</issue><spage>10674</spage><epage>10691</epage><pages>10674-10691</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage, important feature variables are determined by four principles, namely maximizing mutual information between features and classes, minimizing mutual information between features, maximizing the initial classification accuracy, and selecting features that produce not only the local maximum or minimum of the difference between the mean Mahalanobis distances (MDs) of normal and abnormal samples but also the largest number of features. At the threshold determination stage, using the selected features, particle swarm optimization is used to determine the optimal threshold for classifying normal and abnormal samples according to the principle of maximizing classification accuracy. At the classification and discrimination stage, the samples are divided into two classes according to their MDs and optimal threshold. Experimental results show that OMTS obtains 0.92, 0.95, 0.81, 0.88, and 0.74 in accuracy on the Forest Type Mapping UCI, Fetal Health Classification, Connectionist Bench, Wine Quality, and Oil datasets, respectively, and has better classification performance than other algorithms.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-021-02929-8</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-8627-2124</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0924-669X
ispartof Applied intelligence (Dordrecht, Netherlands), 2022-07, Vol.52 (9), p.10674-10691
issn 0924-669X
1573-7497
language eng
recordid cdi_proquest_journals_2678581734
source ABI/INFORM Global; Springer Nature
subjects Accuracy
Algorithms
Artificial Intelligence
Classification
Computer Science
Data mining
Machine learning
Machines
Manufacturing
Maximization
Mechanical Engineering
Particle swarm optimization
Principles
Processes
Taguchi methods
title Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T23%3A29%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Classification%20algorithm%20for%20class%20imbalanced%20data%20based%20on%20optimized%20Mahalanobis-Taguchi%20system&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Mao,%20Ting&rft.date=2022-07-01&rft.volume=52&rft.issue=9&rft.spage=10674&rft.epage=10691&rft.pages=10674-10691&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-021-02929-8&rft_dat=%3Cproquest_cross%3E2678581734%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c385t-11893e4f2ffa2eb00e87a39f2610402c5c762ea31fb8ced17a0af39910619d403%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2678581734&rft_id=info:pmid/&rfr_iscdi=true