Loading…

FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification

•We claim that SMOTE has a weakness when facing high-dimensional problems.•We propose a general version of the SMOTE strategy using OWA operators.•The proposal includes a feature weighting process that considers relevancy/redundancy.•This new component leads to a better definition of the neighborhoo...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2022-04, Vol.124, p.108511, Article 108511
Main Authors: Maldonado, Sebastián, Vairetti, Carla, Fernandez, Alberto, Herrera, Francisco
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c306t-e264043970ac136d3f9c23f99364ef6405c5e5e330faa6f50d5df5d52587018d3
cites cdi_FETCH-LOGICAL-c306t-e264043970ac136d3f9c23f99364ef6405c5e5e330faa6f50d5df5d52587018d3
container_end_page
container_issue
container_start_page 108511
container_title Pattern recognition
container_volume 124
creator Maldonado, Sebastián
Vairetti, Carla
Fernandez, Alberto
Herrera, Francisco
description •We claim that SMOTE has a weakness when facing high-dimensional problems.•We propose a general version of the SMOTE strategy using OWA operators.•The proposal includes a feature weighting process that considers relevancy/redundancy.•This new component leads to a better definition of the neighborhood of minority samples.•Experiments carried out on 42 datasets show the virtues of our method. The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method.
doi_str_mv 10.1016/j.patcog.2021.108511
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_patcog_2021_108511</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0031320321006877</els_id><sourcerecordid>S0031320321006877</sourcerecordid><originalsourceid>FETCH-LOGICAL-c306t-e264043970ac136d3f9c23f99364ef6405c5e5e330faa6f50d5df5d52587018d3</originalsourceid><addsrcrecordid>eNp9kMFKAzEQhoMoWKtv4GFfIHWSbHa3HoRSWhUqFax4DDE7aVO2zZKsFd_eLOvZywzMP__PzEfILYMJA1bc7Set7ozfTjhwlkaVZOyMjFhVCipZzs_JCEAwKjiIS3IV4x6AlUkYkdflB317WW8W99kss6i7r4D0G91212Gd-ROGqA9t447bTLdt8NrsMutD5g6futFHk5ZMo2N01hndOX-8JhdWNxFv_vqYvC8Xm_kTXa0fn-ezFTUCio4iL3LIxbQEbZgoamGnhqcyFUWONmnSSJQoBFitCyuhlrWVteSyKoFVtRiTfMg1wccY0Ko2uIMOP4qB6qmovRqoqJ6KGqgk28Ngw3TbyWFQ0Tjs_3ABTadq7_4P-AXuxGyJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification</title><source>Elsevier</source><creator>Maldonado, Sebastián ; Vairetti, Carla ; Fernandez, Alberto ; Herrera, Francisco</creator><creatorcontrib>Maldonado, Sebastián ; Vairetti, Carla ; Fernandez, Alberto ; Herrera, Francisco</creatorcontrib><description>•We claim that SMOTE has a weakness when facing high-dimensional problems.•We propose a general version of the SMOTE strategy using OWA operators.•The proposal includes a feature weighting process that considers relevancy/redundancy.•This new component leads to a better definition of the neighborhood of minority samples.•Experiments carried out on 42 datasets show the virtues of our method. The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method.</description><identifier>ISSN: 0031-3203</identifier><identifier>EISSN: 1873-5142</identifier><identifier>DOI: 10.1016/j.patcog.2021.108511</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Data resampling ; Feature selection ; Imbalanced data classification ; OWA Operators ; SMOTE</subject><ispartof>Pattern recognition, 2022-04, Vol.124, p.108511, Article 108511</ispartof><rights>2021 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c306t-e264043970ac136d3f9c23f99364ef6405c5e5e330faa6f50d5df5d52587018d3</citedby><cites>FETCH-LOGICAL-c306t-e264043970ac136d3f9c23f99364ef6405c5e5e330faa6f50d5df5d52587018d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Maldonado, Sebastián</creatorcontrib><creatorcontrib>Vairetti, Carla</creatorcontrib><creatorcontrib>Fernandez, Alberto</creatorcontrib><creatorcontrib>Herrera, Francisco</creatorcontrib><title>FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification</title><title>Pattern recognition</title><description>•We claim that SMOTE has a weakness when facing high-dimensional problems.•We propose a general version of the SMOTE strategy using OWA operators.•The proposal includes a feature weighting process that considers relevancy/redundancy.•This new component leads to a better definition of the neighborhood of minority samples.•Experiments carried out on 42 datasets show the virtues of our method. The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method.</description><subject>Data resampling</subject><subject>Feature selection</subject><subject>Imbalanced data classification</subject><subject>OWA Operators</subject><subject>SMOTE</subject><issn>0031-3203</issn><issn>1873-5142</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKAzEQhoMoWKtv4GFfIHWSbHa3HoRSWhUqFax4DDE7aVO2zZKsFd_eLOvZywzMP__PzEfILYMJA1bc7Set7ozfTjhwlkaVZOyMjFhVCipZzs_JCEAwKjiIS3IV4x6AlUkYkdflB317WW8W99kss6i7r4D0G91212Gd-ROGqA9t447bTLdt8NrsMutD5g6futFHk5ZMo2N01hndOX-8JhdWNxFv_vqYvC8Xm_kTXa0fn-ezFTUCio4iL3LIxbQEbZgoamGnhqcyFUWONmnSSJQoBFitCyuhlrWVteSyKoFVtRiTfMg1wccY0Ko2uIMOP4qB6qmovRqoqJ6KGqgk28Ngw3TbyWFQ0Tjs_3ABTadq7_4P-AXuxGyJ</recordid><startdate>202204</startdate><enddate>202204</enddate><creator>Maldonado, Sebastián</creator><creator>Vairetti, Carla</creator><creator>Fernandez, Alberto</creator><creator>Herrera, Francisco</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202204</creationdate><title>FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification</title><author>Maldonado, Sebastián ; Vairetti, Carla ; Fernandez, Alberto ; Herrera, Francisco</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c306t-e264043970ac136d3f9c23f99364ef6405c5e5e330faa6f50d5df5d52587018d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Data resampling</topic><topic>Feature selection</topic><topic>Imbalanced data classification</topic><topic>OWA Operators</topic><topic>SMOTE</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Maldonado, Sebastián</creatorcontrib><creatorcontrib>Vairetti, Carla</creatorcontrib><creatorcontrib>Fernandez, Alberto</creatorcontrib><creatorcontrib>Herrera, Francisco</creatorcontrib><collection>CrossRef</collection><jtitle>Pattern recognition</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Maldonado, Sebastián</au><au>Vairetti, Carla</au><au>Fernandez, Alberto</au><au>Herrera, Francisco</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification</atitle><jtitle>Pattern recognition</jtitle><date>2022-04</date><risdate>2022</risdate><volume>124</volume><spage>108511</spage><pages>108511-</pages><artnum>108511</artnum><issn>0031-3203</issn><eissn>1873-5142</eissn><abstract>•We claim that SMOTE has a weakness when facing high-dimensional problems.•We propose a general version of the SMOTE strategy using OWA operators.•The proposal includes a feature weighting process that considers relevancy/redundancy.•This new component leads to a better definition of the neighborhood of minority samples.•Experiments carried out on 42 datasets show the virtues of our method. The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.patcog.2021.108511</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0031-3203
ispartof Pattern recognition, 2022-04, Vol.124, p.108511, Article 108511
issn 0031-3203
1873-5142
language eng
recordid cdi_crossref_primary_10_1016_j_patcog_2021_108511
source Elsevier
subjects Data resampling
Feature selection
Imbalanced data classification
OWA Operators
SMOTE
title FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T15%3A11%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FW-SMOTE:%20A%20feature-weighted%20oversampling%20approach%20for%20imbalanced%20classification&rft.jtitle=Pattern%20recognition&rft.au=Maldonado,%20Sebasti%C3%A1n&rft.date=2022-04&rft.volume=124&rft.spage=108511&rft.pages=108511-&rft.artnum=108511&rft.issn=0031-3203&rft.eissn=1873-5142&rft_id=info:doi/10.1016/j.patcog.2021.108511&rft_dat=%3Celsevier_cross%3ES0031320321006877%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c306t-e264043970ac136d3f9c23f99364ef6405c5e5e330faa6f50d5df5d52587018d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true