Loading…

Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion

Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need fo...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-03
Main Authors: Chen, Si, Zeng, Yi, Wang, Jiachen T, Park, Won, Chen, Xun, Lyu, Lingjuan, Mao, Zhuoqing, Jia, Ruoxi
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limited results. Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers. We establish that relying solely on perceptual similarity is insufficient for robust defenses, and the stability of model predictions in response to input and parameter perturbations is also crucial. To tackle this, we introduce a novel bi-level optimization-based framework for model inversion, promoting stability and visual quality. Interestingly, we discover that reconstructed samples from a pre-trained generator's latent space are backdoor-free, even when utilizing signals from a backdoored model. We provide a theoretical analysis to support this finding. Our evaluation demonstrates that our stabilized model inversion technique achieves state-of-the-art backdoor removal performance without clean in-distribution data, matching or surpassing performance using the same amount of clean samples.
ISSN:2331-8422