Loading…

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fun...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-12
Main Authors: Youssef Allouah, Kazdan, Joshua, Guerraoui, Rachid, Koyejo, Sanmi
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Youssef Allouah
Kazdan, Joshua
Guerraoui, Rachid
Koyejo, Sanmi
description Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3144198143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3144198143</sourcerecordid><originalsourceid>FETCH-proquest_journals_31441981433</originalsourceid><addsrcrecordid>eNqNi8sKwjAURIMgWLT_EHAdaB7Vuq6KguJG1yVqalPiTc0D9O-t4ge4GubMmQFKGOeUFIKxEUq9b7MsY7M5y3OeoN2xUfgUtNHhhSVccWnvnVHPT7U13gL50kMMxNZkqX1w-hyDtoD38tJo6N9glHSg4TZBw1oar9JfjtF0vTqWG9I5-4jKh6q10UE_VZwKQRcFFZz_Z70B6V08zg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3144198143</pqid></control><display><type>article</type><title>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</title><source>Publicly Available Content Database (ProQuest Open Access資料庫)</source><creator>Youssef Allouah ; Kazdan, Joshua ; Guerraoui, Rachid ; Koyejo, Sanmi</creator><creatorcontrib>Youssef Allouah ; Kazdan, Joshua ; Guerraoui, Rachid ; Koyejo, Sanmi</creatorcontrib><description>Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Complexity ; Privacy ; Tradeoffs</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3144198143?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Youssef Allouah</creatorcontrib><creatorcontrib>Kazdan, Joshua</creatorcontrib><creatorcontrib>Guerraoui, Rachid</creatorcontrib><creatorcontrib>Koyejo, Sanmi</creatorcontrib><title>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</title><title>arXiv.org</title><description>Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.</description><subject>Complexity</subject><subject>Privacy</subject><subject>Tradeoffs</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNi8sKwjAURIMgWLT_EHAdaB7Vuq6KguJG1yVqalPiTc0D9O-t4ge4GubMmQFKGOeUFIKxEUq9b7MsY7M5y3OeoN2xUfgUtNHhhSVccWnvnVHPT7U13gL50kMMxNZkqX1w-hyDtoD38tJo6N9glHSg4TZBw1oar9JfjtF0vTqWG9I5-4jKh6q10UE_VZwKQRcFFZz_Z70B6V08zg</recordid><startdate>20241212</startdate><enddate>20241212</enddate><creator>Youssef Allouah</creator><creator>Kazdan, Joshua</creator><creator>Guerraoui, Rachid</creator><creator>Koyejo, Sanmi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241212</creationdate><title>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</title><author>Youssef Allouah ; Kazdan, Joshua ; Guerraoui, Rachid ; Koyejo, Sanmi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31441981433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Complexity</topic><topic>Privacy</topic><topic>Tradeoffs</topic><toplevel>online_resources</toplevel><creatorcontrib>Youssef Allouah</creatorcontrib><creatorcontrib>Kazdan, Joshua</creatorcontrib><creatorcontrib>Guerraoui, Rachid</creatorcontrib><creatorcontrib>Koyejo, Sanmi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database (ProQuest Open Access資料庫)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Youssef Allouah</au><au>Kazdan, Joshua</au><au>Guerraoui, Rachid</au><au>Koyejo, Sanmi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning</atitle><jtitle>arXiv.org</jtitle><date>2024-12-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_3144198143
source Publicly Available Content Database (ProQuest Open Access資料庫)
subjects Complexity
Privacy
Tradeoffs
title The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T03%3A56%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=The%20Utility%20and%20Complexity%20of%20In-%20and%20Out-of-Distribution%20Machine%20Unlearning&rft.jtitle=arXiv.org&rft.au=Youssef%20Allouah&rft.date=2024-12-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3144198143%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31441981433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3144198143&rft_id=info:pmid/&rfr_iscdi=true