Loading…

Oversampling for Imbalanced Data via Optimal Transport

The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class sam...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan, Yuguang, Tan, Mingkui, Xu, Yanwu, Cao, Jiezhang, Ng, Michael, Min, Huaqing, Wu, Qingyao
Format: Conference Proceeding
Language:English
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3
cites
container_end_page 5612
container_issue 1
container_start_page 5605
container_title
container_volume 33
creator Yan, Yuguang
Tan, Mingkui
Xu, Yanwu
Cao, Jiezhang
Ng, Michael
Min, Huaqing
Wu, Qingyao
description The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.
doi_str_mv 10.1609/aaai.v33i01.33015605
format conference_proceeding
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v33i01_33015605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v33i01_33015605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3</originalsourceid><addsrcrecordid>eNo1z01qwzAQBWBRWmhIc4MufAG7Go8lS8uS_gUC3qRrMZKl4uLYRjKG3r4OaWfzZvV4H2OPwAuQXD8RUVcsiB2HApGDkFzcsE2JdZVjJdXt-oPQuUCt79kupW--XqUBoN4w2Sw-JjpPfTd8ZWGM2eFsqafB-TZ7oZmypaOsmebuTH12ijSkaYzzA7sL1Ce_-8st-3x7Pe0_8mPzftg_H3MHCuacvLbOCu8JAUpdoyyVB9kG560QHpUVoMFJW4ZaBdCtan3FkZeldKjrFresuva6OKYUfTBTXJfEHwPcXPjmwjdXvvnn4y89rE73</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Oversampling for Imbalanced Data via Optimal Transport</title><source>Freely Accessible Journals</source><creator>Yan, Yuguang ; Tan, Mingkui ; Xu, Yanwu ; Cao, Jiezhang ; Ng, Michael ; Min, Huaqing ; Wu, Qingyao</creator><creatorcontrib>Yan, Yuguang ; Tan, Mingkui ; Xu, Yanwu ; Cao, Jiezhang ; Ng, Michael ; Min, Huaqing ; Wu, Qingyao</creatorcontrib><description>The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v33i01.33015605</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2019, Vol.33 (1), p.5605-5612</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Yan, Yuguang</creatorcontrib><creatorcontrib>Tan, Mingkui</creatorcontrib><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Cao, Jiezhang</creatorcontrib><creatorcontrib>Ng, Michael</creatorcontrib><creatorcontrib>Min, Huaqing</creatorcontrib><creatorcontrib>Wu, Qingyao</creatorcontrib><title>Oversampling for Imbalanced Data via Optimal Transport</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNo1z01qwzAQBWBRWmhIc4MufAG7Go8lS8uS_gUC3qRrMZKl4uLYRjKG3r4OaWfzZvV4H2OPwAuQXD8RUVcsiB2HApGDkFzcsE2JdZVjJdXt-oPQuUCt79kupW--XqUBoN4w2Sw-JjpPfTd8ZWGM2eFsqafB-TZ7oZmypaOsmebuTH12ijSkaYzzA7sL1Ce_-8st-3x7Pe0_8mPzftg_H3MHCuacvLbOCu8JAUpdoyyVB9kG560QHpUVoMFJW4ZaBdCtan3FkZeldKjrFresuva6OKYUfTBTXJfEHwPcXPjmwjdXvvnn4y89rE73</recordid><startdate>20190717</startdate><enddate>20190717</enddate><creator>Yan, Yuguang</creator><creator>Tan, Mingkui</creator><creator>Xu, Yanwu</creator><creator>Cao, Jiezhang</creator><creator>Ng, Michael</creator><creator>Min, Huaqing</creator><creator>Wu, Qingyao</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20190717</creationdate><title>Oversampling for Imbalanced Data via Optimal Transport</title><author>Yan, Yuguang ; Tan, Mingkui ; Xu, Yanwu ; Cao, Jiezhang ; Ng, Michael ; Min, Huaqing ; Wu, Qingyao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Yan, Yuguang</creatorcontrib><creatorcontrib>Tan, Mingkui</creatorcontrib><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Cao, Jiezhang</creatorcontrib><creatorcontrib>Ng, Michael</creatorcontrib><creatorcontrib>Min, Huaqing</creatorcontrib><creatorcontrib>Wu, Qingyao</creatorcontrib><collection>CrossRef</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yan, Yuguang</au><au>Tan, Mingkui</au><au>Xu, Yanwu</au><au>Cao, Jiezhang</au><au>Ng, Michael</au><au>Min, Huaqing</au><au>Wu, Qingyao</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Oversampling for Imbalanced Data via Optimal Transport</atitle><btitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</btitle><date>2019-07-17</date><risdate>2019</risdate><volume>33</volume><issue>1</issue><spage>5605</spage><epage>5612</epage><pages>5605-5612</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.</abstract><doi>10.1609/aaai.v33i01.33015605</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2159-5399
ispartof Proceedings of the ... AAAI Conference on Artificial Intelligence, 2019, Vol.33 (1), p.5605-5612
issn 2159-5399
2374-3468
language eng
recordid cdi_crossref_primary_10_1609_aaai_v33i01_33015605
source Freely Accessible Journals
title Oversampling for Imbalanced Data via Optimal Transport
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T20%3A03%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Oversampling%20for%20Imbalanced%20Data%20via%20Optimal%20Transport&rft.btitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Yan,%20Yuguang&rft.date=2019-07-17&rft.volume=33&rft.issue=1&rft.spage=5605&rft.epage=5612&rft.pages=5605-5612&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v33i01.33015605&rft_dat=%3Ccrossref%3E10_1609_aaai_v33i01_33015605%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true