Loading…

Dataset condensation with latent quantile matching

Dataset condensation (DC) methods aim to learn a smaller, synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wei, Wei, De Schepper, Tom, Mets, Kevin
Format:	Conference Proceeding
Language:	English
Subjects:	Conferences Continual graph learning Current distribution Data privacy Dataset condensation Dataset distillation Distribution matching Goodness of fit tests Machine learning Measurement Memory management Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	7712
container_issue
container_start_page	7703
container_title
container_volume
creator	Wei, Wei De Schepper, Tom Mets, Kevin
description	Dataset condensation (DC) methods aim to learn a smaller, synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However, two distributions with the same mean can still be vastly different. In this work, we demonstrate the shortcomings of using Maximum Mean Discrepancy to match latent distributions, i.e., the weak matching power and lack of outlier regularization. To alleviate these shortcomings, we propose our new method: Latent Quantile Matching (LQM), which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions. Empirical experiments on both image and graph-structured datasets show that LQM matches or outperforms previous state of the art in distribution matching based DC. Moreover, we show that LQM improves the performance in continual graph learning (CGL) setting, where memory efficiency and privacy can be important. Our work sheds light on the application of DM based DC for CGL.
doi_str_mv	10.1109/CVPRW63382.2024.00766
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10678249</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10678249</ieee_id><sourcerecordid>10678249</sourcerecordid><originalsourceid>FETCH-LOGICAL-i686-ec01fcf1a537e4a0230be98779bf6fb85bc1327354e9804a21e6b9689efee7763</originalsourceid><addsrcrecordid>eNotzNtKAzEQgOEoCJa6b6CwL7DrTA6T5FLWIxSUUuplya4TG9mm2o2Ib6-gVz98F78QFwgtIvjLbv20fCalnGwlSN0CWKIjUXnrnTKgyGirj8VMIkFjDdKpqKbpDQAQnDFezYS8DiVMXOphn184T6Gkfa6_UtnWYyicS_3xGXJJI9e7UIZtyq9n4iSGceLqv3Oxur1ZdffN4vHuobtaNIkcNTwAxiFiMMqyDiAV9Oydtb6PFHtn-gGVtMroXwUdJDL1npznyGwtqbk4_9smZt68H9IuHL43CGSd1F79ADL-Rko</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Dataset condensation with latent quantile matching</title><source>IEEE Xplore All Conference Series</source><creator>Wei, Wei ; De Schepper, Tom ; Mets, Kevin</creator><creatorcontrib>Wei, Wei ; De Schepper, Tom ; Mets, Kevin</creatorcontrib><description>Dataset condensation (DC) methods aim to learn a smaller, synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However, two distributions with the same mean can still be vastly different. In this work, we demonstrate the shortcomings of using Maximum Mean Discrepancy to match latent distributions, i.e., the weak matching power and lack of outlier regularization. To alleviate these shortcomings, we propose our new method: Latent Quantile Matching (LQM), which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions. Empirical experiments on both image and graph-structured datasets show that LQM matches or outperforms previous state of the art in distribution matching based DC. Moreover, we show that LQM improves the performance in continual graph learning (CGL) setting, where memory efficiency and privacy can be important. Our work sheds light on the application of DM based DC for CGL.</description><identifier>EISSN: 2160-7516</identifier><identifier>EISBN: 9798350365474</identifier><identifier>DOI: 10.1109/CVPRW63382.2024.00766</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Conferences ; Continual graph learning ; Current distribution ; Data privacy ; Dataset condensation ; Dataset distillation ; Distribution matching ; Goodness of fit tests ; Machine learning ; Measurement ; Memory management ; Training</subject><ispartof>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024, p.7703-7712</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10678249$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54533,54910</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10678249$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wei, Wei</creatorcontrib><creatorcontrib>De Schepper, Tom</creatorcontrib><creatorcontrib>Mets, Kevin</creatorcontrib><title>Dataset condensation with latent quantile matching</title><title>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</title><addtitle>CVPRW</addtitle><description>Dataset condensation (DC) methods aim to learn a smaller, synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However, two distributions with the same mean can still be vastly different. In this work, we demonstrate the shortcomings of using Maximum Mean Discrepancy to match latent distributions, i.e., the weak matching power and lack of outlier regularization. To alleviate these shortcomings, we propose our new method: Latent Quantile Matching (LQM), which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions. Empirical experiments on both image and graph-structured datasets show that LQM matches or outperforms previous state of the art in distribution matching based DC. Moreover, we show that LQM improves the performance in continual graph learning (CGL) setting, where memory efficiency and privacy can be important. Our work sheds light on the application of DM based DC for CGL.</description><subject>Conferences</subject><subject>Continual graph learning</subject><subject>Current distribution</subject><subject>Data privacy</subject><subject>Dataset condensation</subject><subject>Dataset distillation</subject><subject>Distribution matching</subject><subject>Goodness of fit tests</subject><subject>Machine learning</subject><subject>Measurement</subject><subject>Memory management</subject><subject>Training</subject><issn>2160-7516</issn><isbn>9798350365474</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotzNtKAzEQgOEoCJa6b6CwL7DrTA6T5FLWIxSUUuplya4TG9mm2o2Ib6-gVz98F78QFwgtIvjLbv20fCalnGwlSN0CWKIjUXnrnTKgyGirj8VMIkFjDdKpqKbpDQAQnDFezYS8DiVMXOphn184T6Gkfa6_UtnWYyicS_3xGXJJI9e7UIZtyq9n4iSGceLqv3Oxur1ZdffN4vHuobtaNIkcNTwAxiFiMMqyDiAV9Oydtb6PFHtn-gGVtMroXwUdJDL1npznyGwtqbk4_9smZt68H9IuHL43CGSd1F79ADL-Rko</recordid><startdate>20240617</startdate><enddate>20240617</enddate><creator>Wei, Wei</creator><creator>De Schepper, Tom</creator><creator>Mets, Kevin</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240617</creationdate><title>Dataset condensation with latent quantile matching</title><author>Wei, Wei ; De Schepper, Tom ; Mets, Kevin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i686-ec01fcf1a537e4a0230be98779bf6fb85bc1327354e9804a21e6b9689efee7763</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Conferences</topic><topic>Continual graph learning</topic><topic>Current distribution</topic><topic>Data privacy</topic><topic>Dataset condensation</topic><topic>Dataset distillation</topic><topic>Distribution matching</topic><topic>Goodness of fit tests</topic><topic>Machine learning</topic><topic>Measurement</topic><topic>Memory management</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Wei, Wei</creatorcontrib><creatorcontrib>De Schepper, Tom</creatorcontrib><creatorcontrib>Mets, Kevin</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Wei</au><au>De Schepper, Tom</au><au>Mets, Kevin</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Dataset condensation with latent quantile matching</atitle><btitle>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</btitle><stitle>CVPRW</stitle><date>2024-06-17</date><risdate>2024</risdate><spage>7703</spage><epage>7712</epage><pages>7703-7712</pages><eissn>2160-7516</eissn><eisbn>9798350365474</eisbn><coden>IEEPAD</coden><abstract>Dataset condensation (DC) methods aim to learn a smaller, synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However, two distributions with the same mean can still be vastly different. In this work, we demonstrate the shortcomings of using Maximum Mean Discrepancy to match latent distributions, i.e., the weak matching power and lack of outlier regularization. To alleviate these shortcomings, we propose our new method: Latent Quantile Matching (LQM), which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions. Empirical experiments on both image and graph-structured datasets show that LQM matches or outperforms previous state of the art in distribution matching based DC. Moreover, we show that LQM improves the performance in continual graph learning (CGL) setting, where memory efficiency and privacy can be important. Our work sheds light on the application of DM based DC for CGL.</abstract><pub>IEEE</pub><doi>10.1109/CVPRW63382.2024.00766</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2160-7516
ispartof	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024, p.7703-7712
issn	2160-7516
language	eng
recordid	cdi_ieee_primary_10678249
source	IEEE Xplore All Conference Series
subjects	Conferences Continual graph learning Current distribution Data privacy Dataset condensation Dataset distillation Distribution matching Goodness of fit tests Machine learning Measurement Memory management Training
title	Dataset condensation with latent quantile matching
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T00%3A03%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Dataset%20condensation%20with%20latent%20quantile%20matching&rft.btitle=2024%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20Workshops%20(CVPRW)&rft.au=Wei,%20Wei&rft.date=2024-06-17&rft.spage=7703&rft.epage=7712&rft.pages=7703-7712&rft.eissn=2160-7516&rft.coden=IEEPAD&rft_id=info:doi/10.1109/CVPRW63382.2024.00766&rft.eisbn=9798350365474&rft_dat=%3Cieee_CHZPO%3E10678249%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i686-ec01fcf1a537e4a0230be98779bf6fb85bc1327354e9804a21e6b9689efee7763%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10678249&rfr_iscdi=true