Loading…
Oversampling for Imbalanced Data via Optimal Transport
The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class sam...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3 |
---|---|
cites | |
container_end_page | 5612 |
container_issue | 1 |
container_start_page | 5605 |
container_title | |
container_volume | 33 |
creator | Yan, Yuguang Tan, Mingkui Xu, Yanwu Cao, Jiezhang Ng, Michael Min, Huaqing Wu, Qingyao |
description | The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics. |
doi_str_mv | 10.1609/aaai.v33i01.33015605 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v33i01_33015605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v33i01_33015605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3</originalsourceid><addsrcrecordid>eNo1z01qwzAQBWBRWmhIc4MufAG7Go8lS8uS_gUC3qRrMZKl4uLYRjKG3r4OaWfzZvV4H2OPwAuQXD8RUVcsiB2HApGDkFzcsE2JdZVjJdXt-oPQuUCt79kupW--XqUBoN4w2Sw-JjpPfTd8ZWGM2eFsqafB-TZ7oZmypaOsmebuTH12ijSkaYzzA7sL1Ce_-8st-3x7Pe0_8mPzftg_H3MHCuacvLbOCu8JAUpdoyyVB9kG560QHpUVoMFJW4ZaBdCtan3FkZeldKjrFresuva6OKYUfTBTXJfEHwPcXPjmwjdXvvnn4y89rE73</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Oversampling for Imbalanced Data via Optimal Transport</title><source>Freely Accessible Journals</source><creator>Yan, Yuguang ; Tan, Mingkui ; Xu, Yanwu ; Cao, Jiezhang ; Ng, Michael ; Min, Huaqing ; Wu, Qingyao</creator><creatorcontrib>Yan, Yuguang ; Tan, Mingkui ; Xu, Yanwu ; Cao, Jiezhang ; Ng, Michael ; Min, Huaqing ; Wu, Qingyao</creatorcontrib><description>The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v33i01.33015605</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2019, Vol.33 (1), p.5605-5612</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Yan, Yuguang</creatorcontrib><creatorcontrib>Tan, Mingkui</creatorcontrib><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Cao, Jiezhang</creatorcontrib><creatorcontrib>Ng, Michael</creatorcontrib><creatorcontrib>Min, Huaqing</creatorcontrib><creatorcontrib>Wu, Qingyao</creatorcontrib><title>Oversampling for Imbalanced Data via Optimal Transport</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNo1z01qwzAQBWBRWmhIc4MufAG7Go8lS8uS_gUC3qRrMZKl4uLYRjKG3r4OaWfzZvV4H2OPwAuQXD8RUVcsiB2HApGDkFzcsE2JdZVjJdXt-oPQuUCt79kupW--XqUBoN4w2Sw-JjpPfTd8ZWGM2eFsqafB-TZ7oZmypaOsmebuTH12ijSkaYzzA7sL1Ce_-8st-3x7Pe0_8mPzftg_H3MHCuacvLbOCu8JAUpdoyyVB9kG560QHpUVoMFJW4ZaBdCtan3FkZeldKjrFresuva6OKYUfTBTXJfEHwPcXPjmwjdXvvnn4y89rE73</recordid><startdate>20190717</startdate><enddate>20190717</enddate><creator>Yan, Yuguang</creator><creator>Tan, Mingkui</creator><creator>Xu, Yanwu</creator><creator>Cao, Jiezhang</creator><creator>Ng, Michael</creator><creator>Min, Huaqing</creator><creator>Wu, Qingyao</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20190717</creationdate><title>Oversampling for Imbalanced Data via Optimal Transport</title><author>Yan, Yuguang ; Tan, Mingkui ; Xu, Yanwu ; Cao, Jiezhang ; Ng, Michael ; Min, Huaqing ; Wu, Qingyao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Yan, Yuguang</creatorcontrib><creatorcontrib>Tan, Mingkui</creatorcontrib><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Cao, Jiezhang</creatorcontrib><creatorcontrib>Ng, Michael</creatorcontrib><creatorcontrib>Min, Huaqing</creatorcontrib><creatorcontrib>Wu, Qingyao</creatorcontrib><collection>CrossRef</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yan, Yuguang</au><au>Tan, Mingkui</au><au>Xu, Yanwu</au><au>Cao, Jiezhang</au><au>Ng, Michael</au><au>Min, Huaqing</au><au>Wu, Qingyao</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Oversampling for Imbalanced Data via Optimal Transport</atitle><btitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</btitle><date>2019-07-17</date><risdate>2019</risdate><volume>33</volume><issue>1</issue><spage>5605</spage><epage>5612</epage><pages>5605-5612</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.</abstract><doi>10.1609/aaai.v33i01.33015605</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2159-5399 |
ispartof | Proceedings of the ... AAAI Conference on Artificial Intelligence, 2019, Vol.33 (1), p.5605-5612 |
issn | 2159-5399 2374-3468 |
language | eng |
recordid | cdi_crossref_primary_10_1609_aaai_v33i01_33015605 |
source | Freely Accessible Journals |
title | Oversampling for Imbalanced Data via Optimal Transport |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T20%3A03%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Oversampling%20for%20Imbalanced%20Data%20via%20Optimal%20Transport&rft.btitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Yan,%20Yuguang&rft.date=2019-07-17&rft.volume=33&rft.issue=1&rft.spage=5605&rft.epage=5612&rft.pages=5605-5612&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v33i01.33015605&rft_dat=%3Ccrossref%3E10_1609_aaai_v33i01_33015605%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c181t-ae9bcb5eea3112973628e16dfceb55e38b5191c6b2f78f19d8de4030226c397d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |