Loading…
Not All Samples Are Born Equal: Towards Effective Clean-Label Backdoor Attacks
•We reveal that the difficulty of clean-label backdoor attacks is mostly due to the antagonistic effects of ‘robust features’ and verify that DNNs have different learning abilities for different samples.•We revisit the paradigm of existing clean-label backdoor attacks and propose a new complementary...
Saved in:
Published in: | Pattern recognition 2023-07, Vol.139, p.109512, Article 109512 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •We reveal that the difficulty of clean-label backdoor attacks is mostly due to the antagonistic effects of ‘robust features’ and verify that DNNs have different learning abilities for different samples.•We revisit the paradigm of existing clean-label backdoor attacks and propose a new complementary paradigm by considering the different learning difficulties of samples.•We empirically verify the effectiveness and the poisoning transferability of our method on benchmark datasets and discuss its intrinsic mechanism.
Recent studies demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks. The attacked model behaves normally on benign samples, while its predictions are misled whenever adversary-specified trigger patterns appear. Currently, clean-label backdoor attacks are usually regarded as the most stealthy methods in which adversaries can only poison samples from the target class without modifying their labels. However, these attacks can hardly succeed. In this paper, we reveal that the difficulty of clean-label attacks mainly lies in the antagonistic effects of ‘robust features’ related to the target class contained in poisoned samples. Specifically, robust features tend to be easily learned by victim models and thus undermine the learning of trigger patterns. Based on these understandings, we propose a simple yet effective plug-in method to enhance clean-label backdoor attacks by poisoning ‘hard’ instead of random samples. We adopt three classical difficulty metrics as examples to implement our method. We demonstrate that our method can consistently improve vanilla attacks, based on extensive experiments on benchmark datasets. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2023.109512 |