Loading…

On the Effectiveness of Adversarial Training Against Backdoor Attacks

Although adversarial training (AT) is regarded as a potential defense against backdoor attacks, AT and its variants have only yielded unsatisfactory results or have even inversely strengthened backdoor attacks. The large discrepancy between expectations and reality motivates us to thoroughly evaluat...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems 2024-10, Vol.35 (10), p.14878-14888
Main Authors:	Gao, Yinghua, Wu, Dongxian, Zhang, Jingfeng, Gan, Guanhao, Xia, Shu-Tao, Niu, Gang, Sugiyama, Masashi
Format:	Article
Language:	English
Subjects:	Adversarial training (AT) backdoor attack deep learning Learning systems Perturbation methods Robustness Toxicology Training Training data Trojan horses
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Although adversarial training (AT) is regarded as a potential defense against backdoor attacks, AT and its variants have only yielded unsatisfactory results or have even inversely strengthened backdoor attacks. The large discrepancy between expectations and reality motivates us to thoroughly evaluate the effectiveness of AT against backdoor attacks across various settings for AT and backdoor attacks. We find that the type and budget of perturbations used in AT are important, and AT with common perturbations is only effective for certain backdoor trigger patterns. Based on these empirical findings, we present some practical suggestions for backdoor defense, including relaxed adversarial perturbation and composite AT. This work not only boosts our confidence in AT's ability to defend against backdoor attacks but also provides some important insights for future research.
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2023.3281872