Loading…

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fun...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-12
Main Authors:	Youssef Allouah, Kazdan, Joshua, Guerraoui, Rachid, Koyejo, Sanmi
Format:	Article
Language:	English
Subjects:	Complexity Privacy Tradeoffs
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Youssef Allouah Kazdan, Joshua Guerraoui, Rachid Koyejo, Sanmi
description	Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3144198143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3144198143</sourcerecordid><originalsourceid>FETCH-proquest_journals_31441981433</originalsourceid><addsrcrecordid>eNqNi8sKwjAURIMgWLT_EHAdaB7Vuq6KguJG1yVqalPiTc0D9O-t4ge4GubMmQFKGOeUFIKxEUq9b7MsY7M5y3OeoN2xUfgUtNHhhSVccWnvnVHPT7U13gL50kMMxNZkqX1w-hyDtoD38tJo6N9glHSg4TZBw1oar9JfjtF0vTqWG9I5-4jKh6q10UE_VZwKQRcFFZz_Z70B6V08zg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3144198143</pqid></control><display><type>article</type><title>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</title><source>Publicly Available Content Database (ProQuest Open Access資料庫)</source><creator>Youssef Allouah ; Kazdan, Joshua ; Guerraoui, Rachid ; Koyejo, Sanmi</creator><creatorcontrib>Youssef Allouah ; Kazdan, Joshua ; Guerraoui, Rachid ; Koyejo, Sanmi</creatorcontrib><description>Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Complexity ; Privacy ; Tradeoffs</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3144198143?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Youssef Allouah</creatorcontrib><creatorcontrib>Kazdan, Joshua</creatorcontrib><creatorcontrib>Guerraoui, Rachid</creatorcontrib><creatorcontrib>Koyejo, Sanmi</creatorcontrib><title>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</title><title>arXiv.org</title><description>Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.</description><subject>Complexity</subject><subject>Privacy</subject><subject>Tradeoffs</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNi8sKwjAURIMgWLT_EHAdaB7Vuq6KguJG1yVqalPiTc0D9O-t4ge4GubMmQFKGOeUFIKxEUq9b7MsY7M5y3OeoN2xUfgUtNHhhSVccWnvnVHPT7U13gL50kMMxNZkqX1w-hyDtoD38tJo6N9glHSg4TZBw1oar9JfjtF0vTqWG9I5-4jKh6q10UE_VZwKQRcFFZz_Z70B6V08zg</recordid><startdate>20241212</startdate><enddate>20241212</enddate><creator>Youssef Allouah</creator><creator>Kazdan, Joshua</creator><creator>Guerraoui, Rachid</creator><creator>Koyejo, Sanmi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241212</creationdate><title>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</title><author>Youssef Allouah ; Kazdan, Joshua ; Guerraoui, Rachid ; Koyejo, Sanmi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31441981433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Complexity</topic><topic>Privacy</topic><topic>Tradeoffs</topic><toplevel>online_resources</toplevel><creatorcontrib>Youssef Allouah</creatorcontrib><creatorcontrib>Kazdan, Joshua</creatorcontrib><creatorcontrib>Guerraoui, Rachid</creatorcontrib><creatorcontrib>Koyejo, Sanmi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database (ProQuest Open Access資料庫)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Youssef Allouah</au><au>Kazdan, Joshua</au><au>Guerraoui, Rachid</au><au>Koyejo, Sanmi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</atitle><jtitle>arXiv.org</jtitle><date>2024-12-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3144198143
source	Publicly Available Content Database (ProQuest Open Access資料庫)
subjects	Complexity Privacy Tradeoffs
title	The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T03%3A56%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=The%20Utility%20and%20Complexity%20of%20In-%20and%20Out-of-Distribution%20Machine%20Unlearning&rft.jtitle=arXiv.org&rft.au=Youssef%20Allouah&rft.date=2024-12-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3144198143%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31441981433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3144198143&rft_id=info:pmid/&rfr_iscdi=true